There are many excellent explanations and illustrations of the generative pre-trained transformer (GPT) ( et al., , , & (). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and the original transformer architectures ( et al., , , , , , , & (). Attention is all you need. Advances in neural information processing systems, 30. ). For example, I can highly recommend the write-up by ( (). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) and the video by ( (). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY ). Nevertheless, I decided to create yet another illustration of GPT for a recent example class I taught. My illustration focuses on two things:

  1. Provide a direct connection from a high-level diagram all the way to an actual code implementation of a GPT.
  2. Make the illustration as simple as possible (avoiding unnecessary complexity, e.g. by focusing on vector instead of full matrix/tensor operations)

Whilst it was initially intended for teaching, I adapted my illustration further so that it can hopefully be useful as a stand-alone visualisation of the GPT architecture.

Note that the illustration contains a diagram from the original GPT paper ( et al., , , & (). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ) and code written by Andrej Karpathy and contributors. I have also found the write-up by ( (). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557 ) a useful reference when creating this illustration.

Here is the illustration (it’s very large, zoom in for full details):

Illustration of the generative pre-trained transformer (GPT) architecture ( et al., , & (). System-Level Natural Language Feedback. https://doi.org/10.48550/arXiv.2306.13588 ) with diagram (left) from the original paper and code (right) by Andrej Karpathy and contributors.

Hope you found this illustration useful and please let me know if you find any errors. The illustration was created using the concepts iPad app.

References

  1. (). Retrieved from https://www.youtube.com/watch?v=kCc8FmEb1nY
  2. , , & (). Improving language understanding by generative pre-training. Retrieved from https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
  3. (). An Introduction to Transformers. https://doi.org/10.48550/arXiv.2304.10557
  4. , , , , , , & (). Attention is all you need. Advances in neural information processing systems, 30.
  5. , & (). System-Level Natural Language Feedback. https://doi.org/10.48550/arXiv.2306.13588

Citation

If you found this post useful for your work, please consider citing it as:

Findeis, Arduin. (Mar 2024). GPT illustrated: from high-level diagram to vectors and code. Retrieved from https://arduin.io/blog/gpt-illustrated/.

or
 @article{Findeis2023GPTillustrated:fromhigh-leveldiagramtovectorsandcode,
        title = "GPT illustrated: from high-level diagram to vectors and code",
        author = "Findeis, Arduin",
        journal = "arduin.io",
        year = "2024",
        month = "March",
        url = "https://arduin.io/blog/gpt-illustrated/"
 }