model roundup

Haiku 4.5

4 items · started 2026-06-07 · ongoing (last activity 2026-06-09)

  1. I built a wire format called GCF and tested whether LLMs could read and write it without any prior training. I sent 10 models the same payload: 500 symbols, 200 edges.

  2. Microsoft just released MAI-Code-1-Flash — a 5B parameter coding model built for fast, efficient developer assistance. Numbers that caught my eye: - 51.2% on SWE-Bench Pro (Claude Haiku 4.5 scores 35.2%) - 71.6% on SWE-Bench Verified (Haik…

  3. https://preview.redd.it/zrzgwjibcy5h1.png?width=534&format=png&auto=webp&s=f42aacf8cf9be6e5ff18a5b2c9c344e6f1482cc8 I (vibe-coder in training) asked an AI coding assistant (Claude Haiku 4.5- Extended, usually using Sonnett 4.6 instead) to…

  4. Overview: It scored 2% (1.79% rounded up) It is 18/20th place scoring above Haiku 4.5 and Minimax M2.7 Full benchmark took 70 hours Average time per task 32m Average output tokens per task: 44k Perspectives: It scored suspiciously similar…

← all threads