model roundup

GPT 2

2 items · started 2026-04-15 · closed 2026-04-17

LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn)

+2 10w

Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…
Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on (www.reddit.com)

+4416 10w fine-tuning

Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…