model roundup
Haiku 4.5
-
I built a wire format called GCF and tested whether LLMs could read and write it without any prior training. I sent 10 models the same payload: 500 symbols, 200 edges.
-
Microsoft's MAI-Code-1-Flash: 5B params, 51% on SWE-Bench Pro, free on OpenRouter (www.reddit.com via reddit)
Microsoft just released MAI-Code-1-Flash — a 5B parameter coding model built for fast, efficient developer assistance. Numbers that caught my eye: - 51.2% on SWE-Bench Pro (Claude Haiku 4.5 scores 35.2%) - 71.6% on SWE-Bench Verified (Haik…
-
https://preview.redd.it/zrzgwjibcy5h1.png?width=534&format=png&auto=webp&s=f42aacf8cf9be6e5ff18a5b2c9c344e6f1482cc8 I (vibe-coder in training) asked an AI coding assistant (Claude Haiku 4.5- Extended, usually using Sonnett 4.6 instead) to…
-
Qwen 3.6 27B on DeepSWE (www.reddit.com via reddit)
Overview: It scored 2% (1.79% rounded up) It is 18/20th place scoring above Haiku 4.5 and Minimax M2.7 Full benchmark took 70 hours Average time per task 32m Average output tokens per task: 44k Perspectives: It scored suspiciously similar…