LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn)
model roundup
GPT 2
-
Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…