![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/xlnet/transformer-encoder-decoder.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/gpt2/gpt2-self-attention-example-2.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/gpt2/gpt-2-transformer-xl-bert-3.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/gpt2/gpt2-sizes-hyperparameters-3.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![OpenAI's GPT-2 Explained | Visualizing Transformer Language Models | Generative Pre-Training | GPT 3 - YouTube OpenAI's GPT-2 Explained | Visualizing Transformer Language Models | Generative Pre-Training | GPT 3 - YouTube](https://i.ytimg.com/vi/XynJ-gM6aD0/maxresdefault.jpg)
OpenAI's GPT-2 Explained | Visualizing Transformer Language Models | Generative Pre-Training | GPT 3 - YouTube
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/xlnet/transformer-decoder-intro.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/gpt2/decoder-only-summarization.png)
The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
![PDF] Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds | Semantic Scholar PDF] Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/c1ac3fbf530bf2eb207aa1a20dd14c8ed9f6766b/2-Figure1-1.png)
PDF] Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds | Semantic Scholar
![The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time. The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.](https://jalammar.github.io/images/gpt2/gpt2-self-attention-split-attention-heads-1.png)