[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book
I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…