model roundup
Haiku 4.5
-
I've been noticing an increasing number of posts and comments on Reddit claiming that LLM models are either becoming dumber over time or have varying performance throughout the day. I tried to find long-form, over-time performance graphs o…
-
Weekend build, ~10 hours. Demo: https://trurent-five.vercel.app/ Problem I was poking at: every major Indian rental site (NoBroker, MagicBricks, 99acres) is infested with brokers even when you filter "direct owner." Reddit actually has hon…
-
Made LLMs play Texas Hold’em against each other. 6 models at the table: a tiny 1.2B running locally on my 16GB MacBook, a couple mid-size ones, and cloud models going up to about 1 trillion parameters.
-
I made 6 LLMs play Texas Hold’em against each other. Ran 5 tournaments on my 16GB MacBook.