model roundup

GPT 2

1 item · started 2026-04-15 · closed 2026-04-17

LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn)

+2 10w

Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…