model roundup

GPT 2

3 items · started 2026-04-15 · closed 2026-04-17

  1. Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…

  2. Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…

  3. Claude Mythos: The System Card Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all.

← all threads