model roundup

GPT 2

1 item · started 2026-04-15 · closed 2026-04-17

  1. Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…

← all threads