[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book

reddit-localllama · www.reddit.com ·20 pts·3 replies ↗ ·1d

I wrote a book that implements modern LLM architectures from scratch. The part most relevant to this sub: Chapter 3 takes GPT-2 and swaps exactly 4 things to get Llama 3.2-3B: LayerNorm → RMSNorm Learned positional encodings → RoPE GELU →…