LLM from scratch, part 32k – Interventions: gradient accumulation (www.gilesthomas.com via hn)
model roundup
GPT 2
-
Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation I've been working on a GPT-2-small-style LLM based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)"…
-
Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…
-
Claude Mythos: The System Card (thezvi.substack.com via hn)
Claude Mythos: The System Card Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all.