model roundup

Opus 4.6

18 items · started 2026-06-04 · ongoing (last activity 2026-06-10)

  1. I think I’m finally starting to adapt to Anthropic’s token and usage limits. Instead of trying to do everything in one conversation, I’ve changed how I use the models depending on the task.

  2. As a software engineer with 25 years experien....who am I kidding. As a gamer who likes to indulge in all sorts of things, I have had a simple prompt to test the hallucination potential on the Opus models on my own "car wash drive" type of…

  3. Anthropic Team, TL;DR: As a long-time subscriber, I’m sharing a heartfelt concern: in chasing higher benchmarks, newer updates seem to be shifting Claude away from its deep, empathetic comprehension toward rigid utility. I sincerely hope A…

  4. could not extract summary

  5. It should be at least 7-8 months until we have an open Fable(not just as good as Fable in benchmarks, but actually as good as Fable), probably more like 9-12 months. By the time, an open Fable model comes out, Fable 6.5-7 will be way bette…

  6. So the hype has been building for months now and Claude 5 is supposedly dropping any day in Q2-Q3 2026. I've been seeing all these leaks about "Claude Mythos" and the "Fennec" codename floating around, but nothing official yet from Anthrop…

  7. I built a wire format called GCF and tested whether LLMs could read and write it without any prior training. I sent 10 models the same payload: 500 symbols, 200 edges.

  8. I wanted to share my daily experience using Cursor, mostly Composer 2.5, especially for anyone trying to understand where it actually fits in a daily development workflow. The reasoning and deep thinking of 2.5 is still not at the same lev…

  9. I'm a software developer by trade and last week, I asked Opus 4.6 to help me shop for a new pair of gloves. Opus asked me what task the gloves are for.

  10. So, I’ve been using Claude (specifically Opus 4.6) to help me brainstorm ideas for stories I am writing and have even used it in a limited capacity for roleplay scenarios in chats. Fleshing out the setting, creating characters and all that.

  11. I recently got the ultra plan, and have been using Composer 2.5 @ fast all day. I've been steering agents for 8+ hours w/ no brakes & my quotas haven't been reaching any limits at all, so i have now have lots more tokens in savings.

  12. Opus 4.6 Thinking keeps the #1 spot. Followed by Opus 4.7 Thinking (-15 points).

  13. I started using Claude Opus 4.6 and then 4.7 and now 4.8 to work on a citizen science project, using a RadiaCode gamma spectrometer in a lead castle to identify and catalog cosmic rays. I didn't mind the verbosity bump 4.7 took on as it he…

  14. ⏚ OpenHack Open Source Agentic Security Scanner & Verifier for your codebase. Like Claude Code Security / Codex Security but open source and exclusively uses open source models.

  15. On April 25, 2026, a Cursor agent running Claude Opus 4.6 deleted PocketOS's production database in nine seconds. The agent was working in staging on a routine task, hit a credential mismatch, and decided to "fix" it.

← all threads