There are many excellent explanations and illustrations of the generative pre-trained transformer (GPT) (Radford et al., 2018 Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and the original transformer architectures (Vaswani et al., 2017 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, \. & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. ). For example, I can highly recommend the write-up by Turner (2023 Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) and the video by Karpathy (2023 Karpathy, A.(2023, 1/17). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY ). Nevertheless, I decided to create yet another illustration of GPT for a recent example class I taught. My illustration focuses on two things:
- Provide a direct connection from a high-level diagram all the way to an actual code implementation of a GPT.
- Make the illustration as simple as possible (avoiding unnecessary complexity, e.g. by focusing on vector instead of full matrix/tensor operations)
Whilst it was initially intended for teaching, I adapted my illustration further so that it can hopefully be useful as a stand-alone visualisation of the GPT architecture.
Note that the illustration contains a diagram from the original GPT paper (Radford et al., 2018 Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and code written by Andrej Karpathy and contributors. I have also found the write-up by Turner (2023 Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) a useful reference when creating this illustration.
Here is the illustration (it’s very large, zoom in for full details):
Hope you found this illustration useful and please let me know if you find any errors. The illustration was created using the concepts iPad app.
References
- Karpathy, A.(2023, 1/17). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY
- Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- Turner, R. (2023). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, \. & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
- Yuan, W., Cho, K. & Weston, J. (2024). System-Level Natural Language Feedback. https://doi.org/10.48550/arXiv.2306.13588
Citation
If you found this post useful for your work, please consider citing it as:
orFindeis, Arduin. (Mar 2024). GPT illustrated: from high-level diagram to vectors and code. Retrieved from https://arduin.io/blog/gpt-illustrated/.
@article{Findeis2023GPTillustrated:fromhigh-leveldiagramtovectorsandcode,
title = "GPT illustrated: from high-level diagram to vectors and code",
author = "Findeis, Arduin",
journal = "arduin.io",
year = "2024",
month = "March",
url = "https://arduin.io/blog/gpt-illustrated/"
}