model roundup

Haiku 4.5

4 items · started 2026-05-19 · closed 2026-05-25

  1. I've been noticing an increasing number of posts and comments on Reddit claiming that LLM models are either becoming dumber over time or have varying performance throughout the day. I tried to find long-form, over-time performance graphs o…

  2. Weekend build, ~10 hours. Demo: https://trurent-five.vercel.app/ Problem I was poking at: every major Indian rental site (NoBroker, MagicBricks, 99acres) is infested with brokers even when you filter "direct owner." Reddit actually has hon…

  3. Made LLMs play Texas Hold’em against each other. 6 models at the table: a tiny 1.2B running locally on my 16GB MacBook, a couple mid-size ones, and cloud models going up to about 1 trillion parameters.

  4. I made 6 LLMs play Texas Hold’em against each other. Ran 5 tournaments on my 16GB MacBook.

← all threads