GPT illustrated: from high-level diagram to vectors and code

There are many excellent explanations and illustrations of the generative pre-trained transformer (GPT) (Radford et al., 2018 Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and the original transformer architectures (Vaswani et al., 2017 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, \. & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ). For example, I can highly recommend the write-up by Turner (2023 Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) and the video by Karpathy (2023 Karpathy, A.(2023, 1/17). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY ). Nevertheless, I decided to create yet another illustration of GPT for a recent example class I taught. My illustration focuses on two things:

Provide a direct connection from a high-level diagram all the way to an actual code implementation of a GPT.
Make the illustration as simple as possible (avoiding unnecessary complexity, e.g. by focusing on vector instead of full matrix/tensor operations)

Whilst it was initially intended for teaching, I adapted my illustration further so that it can hopefully be useful as a stand-alone visualisation of the GPT architecture.

Note that the illustration contains a diagram from the original GPT paper (Radford et al., 2018 Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and code written by Andrej Karpathy and contributors. I have also found the write-up by Turner (2023 Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) a useful reference when creating this illustration.

Here is the illustration (it’s very large, zoom in for full details):

Illustration of the generative pre-trained transformer (GPT) architecture (Yuan et al., 2024 Yuan, W., Cho, K. & Weston, J. (2024). System-Level Natural Language Feedback. https://doi.org/10.48550/arXiv.2306.13588 ) with diagram (left) from the original paper and code (right) by Andrej Karpathy and contributors.

Hope you found this illustration useful and please let me know if you find any errors. The illustration was created using the concepts iPad app.

References

Karpathy, A.(2023, 1/17). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY

Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, \. & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Yuan, W., Cho, K. & Weston, J. (2024). System-Level Natural Language Feedback. https://doi.org/10.48550/arXiv.2306.13588

Citation

If you found this post useful for your work, please consider citing it as:

Findeis, Arduin. (Mar 2024). GPT illustrated: from high-level diagram to vectors and code. Retrieved from https://arduin.io/blog/gpt-illustrated/.

 @article{Findeis2023GPTillustrated:fromhigh-leveldiagramtovectorsandcode,
        title = "GPT illustrated: from high-level diagram to vectors and code",
        author = "Findeis, Arduin",
        journal = "arduin.io",
        year = "2024",
        month = "March",
        url = "https://arduin.io/blog/gpt-illustrated/"
 }

References#

Citation

References