GPT illustrated: from high-level diagram to vectors and code
There are many excellent explanations and illustrations of the generative pre-trained transformer (GPT) (Radford et al., 2018) and the original transformer architectures (Vaswani et al., 2017). For example, I can highly recommend the write-up by Turner (2023 ) and the video by Karpathy (2023). Nevertheless, I decided to create yet another illustration of GPT for a recent example class I taught. My illustration focuses on two things: (1) Provide a direct connection from a high-level diagram all the way to an actual code implementation of a GPT, and (2) make the illustration as simple as possible (avoiding unnecessary complexity, e.g. by focusing on vector instead of full matrix/tensor operations).