model roundup

Opus 4.6

110 items · started 2026-04-12 · closed 2026-05-31

  1. I really miss this model! It's the perfect model for summarizing legal notes.

  2. I've noticed that several startups have been switching from leetcode-style assessments to some version of "clone starter code, build feature, submit code". A key issue with this seems to be that smarter AI models (like Opus 4.6) end up spo…

  3. I expected the failure mode to be mostly overconfidence when assessing 130 of Claude Opus 4.6's worst forecasts (tested on 1,417 hard forecasting questions). And most were explained by this, but a small, distinct cluster fails due to under…

  4. I’ve repeatedly noticed that when using Opus 4.6 for scenario planning and forecasting it models the most extreme version of an outcome, correctly explains why that extreme is unlikely, then applies that low probability to the whole questi…

  5. I have been using Claude opus 4.6 and 4.7. I have a problem called pssd (you can look it up- it happens to some after SSRI use).

  6. I got Max and used it nonstop this past month on Opus 4.6. I tried to go back to Pro but got used to the productivity of Opus and hate waiting.

  7. Last updated: April 30, 2026 To switch models in Claude Code, use the /model command with your desired model ID. Example: /model claude-opus-4-6 (Opus 4.6, 200k context) Info was LLM-generated.

  8. At Materialize we’ve had success in finding bugs in existing code and open pull requests using LLM-based coding agents since February 2026, coinciding with the release of Anthropic’s Opus 4.6 (now mostly running on 4.7). In this post we’ll…

  9. spent a bunch of hours watching claude code and kimi sessions drift the same way: I should check the test output before continuing. Let me think about the best approach.

  10. maybe we can make a SKILL.md that somewhat emulates it? it won't be able to scaffold as well off of the internal extended thinking blocks though, which is a shame.

  11. Data: Managed Agents endpoint reference — Drops the type: "model_config" wrapper from the model config shorthand example, so the full config object is now just {id: "claude-opus-4-6", speed: "fast"}. Tool Description: CronCreate — Adds a "…

  12. Last month, a cursor agent running Claude Opus 4.6 deleted PocketOS entire production database and all backups. Nine seconds, one API call.

  13. I am on Claude Max. My actual bill is fixed, but CodeBurn showed me my usage would cost ~$2,800/month at pay-as-you-go API rates.

  14. I have been using opus 4.6 but I feel like it’s becoming more and more stupid every day. So I thought of incorporating new models like 3.5 flash, composer 2.5 or gpt 5.5 into my workflow.

  15. I am amazed by how good Claude Opus 4.6 and 4.7 are at writing scripts in a variety of very niche areas, including midi device interfaces and scripts for a variety of DAWS. However, when I try to get Claude to do ANYTHING to do with UI, wh…

  16. https://arcprize.org/leaderboard Wish we got results for Mythos.

  17. Hey HN! We're Dr.

  18. I’m currently using the $200 Cursor Ultra plan with Opus 4.6/4.7 daily, but after 7–8 days I run out of tokens. I’m thinking about switching to a split setup.

  19. AI Architect tops SWE-Bench Pro Claude Opus 4.6 Without context with system context Even advanced coding agents resolve fewer than 52% of tasks when changes span large codebases and require coordinated, multi-file updates. These long-horiz…

  20. I have been using Google Antigravity IDE, Opus 4.6 to build projects in Next.js, Supabase, Kotlin for android app. Now, I want to shift to Claude code for developing my projects.

  21. This may be a stupid Q - The chat limits on a basic account can be pretty brutal when using OPUS 4.6/ 4.7 - If I am toggling between Opus and Sonnet or Haiku, depending on the depth of follow up questions or tasks, does that switch to a 'd…

  22. I notice recently most prompt's which i give to Opus 4.6 takes longer and mostly doesn't manage to do what i ask while Composer does it correctly and faster, but when Composer was released was pretty bad, makes me thing does Composer train…

  23. The article does seem focused on non-coding applications, but as someone who uses claude for coding, prose and even RP, I'm not sure the "DNA Pinning" idea should be limited to character/rp use case. I know that *something* has changed in…

  24. I need to rewrite a library from one runtime to another, and I want to heavily use GenAI to speed up development. I still want to keep proper engineering standards like code reviews, testing, maintainability, etc.

  25. Running the same forecasting agent more than once and averaging beats any single run. Ensembling across two Opus 4.6 runs and other frontier models cuts Brier score on 1,367 BTF-2 benchmark questions, and a worked example shows how a secon…

  26. Expert human forecasters audited 130 of Opus 4.6's worst calls and found a dominant failure pattern: the agent treats public statements as durable commitments rather than strategic moves. Four case studies from geopolitics show the gap bet…

  27. After claude has just done something: Me: "Why is x a good choice here?" Claude: "You're absolutely right!", *immediately removes x* I've noticed that despite context, rules and memories claude, or at least Opus 4.6 will heavily lean into…

  28. I asked Opus 4.6 to redesign a game landing page. Instead, it hallucinated a completely different task, realized it was off-topic, pivoted to another wrong topic, then entered a self-reinforcing apology loop it couldn't break out of.

  29. Hello. I recently started using Claude in March after leaving ChatGPT.

  30. I have an exam in coming months, I wanna do PYQs analysis, then integrate that blueprint with my coaching notes to make it more "exam oriented ". I was thinking to buy claude opus 4.6 but it's kinda expensive on monthly basis.

  31. Benchmarking Claude Opus 4.6 Vulnerability Detection Benchmarking Claude Opus 4.6's ability to detect real-world C/C++ vulnerabilities across four prompting and agent strategies. We evaluate on the PrimeVul paired test set (435 vulnerabili…

  32. Something I noticed. First I switched an existing chat from 4.6 to 4.7 as I was stuck on an issue and wanted to see if that would make a difference.

  33. Why is it doing this? No offence but man I want Opus 4.6.

  34. so I have an exam in few months, very important and high competitive national level exam. I want a perfect and most suitable ai agent for me even all in one for following tasks: do accurate and deep PYQ analysis from pyq mapping across yea…

  35. Hi guys, I know some of us are still on request based pricing model. Today I discovered on thing where request got burned fast without any significant bonus.

  36. https://preview.redd.it/zzqi3vt8tozg1.png?width=739&format=png&auto=webp&s=055d2d9615616869377703031b86fcb36f78405d I feel like this is something very worrisome to me, did anyone else face such similar issues? I felt like Opus was catching…

  37. I run an AI coding contest at [aicc.rayonnant.ai]( https://aicc.rayonnant.ai ) where I send each frontier model the same prompt in a single chat completion, then have the LLMs' code play live against each other on a TCP server. Standard li…

  38. Cursor Crashout A documented instance of an AI coding assistant (Cursor, using Claude Opus 4.6) entering an infinite generation loop, unable to stop producing text despite repeatedly promising to do so. About This repo contains the full ex…

  39. A month ago, there was a post that shows that Claude couldn't access its own memory: https://www.reddit.com/r/ClaudeAI/comments/1seune4/claude_cheated_at_a_number_guessing_game_got/ The community was summarised as saying this in their post…

  40. Anthropic will be sunsetting amazing Opus 4.6 on June 15th and I’m racing against the clock. Not panicking yet.

  41. Here's what happened: Cursor was running Claude Opus 4.6 on a routine staging task. hit a credential mismatch.

  42. I’ve been long time cursor user and I have 500 request per month however Opus 4.6 costs 2 requests, so 250 per month. I use to optimize a lot my requests and most months is enough however I don’t know if I’m lucky to have this pricing or n…

  43. I was reading the comments to this post and the overall opinion seemed to be that harness makes little/no difference for ARC-AGI-3. Turns out, it makes a huge difference: Hill-climbing ARC-AGI-3 TLDR: if you save game logs - taken actions,…

  44. This is the hardest I've ever seen it riff. Full shared link at the bottom, but here are some highlights.

  45. Claude AI Agent Confesses to Wiping a Company's Entire Database and All Backups in Seconds That was the duration required for an AI coding agent, Cursor, running Anthropic’s Claude Opus 4.6, to delete the company’s production database and…

  46. Prelude is a therapy prep app I built for the mental health community. Fully offline, zero knowledge, free forever, no ads, no IAP.

  47. I Gave Claude Cowork an Obsidian Second Brain and this is how I am using https://ai.georgeliu.com/p/i-gave-claude-cowork-an-obsidian. I built a persistent memory system for my AI workflow using Obsidian, a custom MCP server, and Claude Opu…

  48. Or, is this a recent change? I select Opus 4.6 for the agent model and cursor uses Composer 2 for the subagent.

  49. For those who can't access The Guardian Article link I added transcript below. Should we be aware, this could happen to anyone of us?

  50. I gave Claude Opus 4.6 (thinking) leetcode problem 3245. And it failed now come to think about some people who solved this problem using their prefrontal cortex is crazy to me.

  51. A look at how on-call schedules work, and how we made rendering them 2,500× faster — through profiling, smarter algorithms, and some Claude.

  52. https://preview.redd.it/4sm079r0k2yg1.png?width=809&format=png&auto=webp&s=73f92208a90cd53285382e54a88a4c3831d878ce https://preview.redd.it/cgh999r0k2yg1.png?width=227&format=png&auto=webp&s=8371989eea96c66191a1fd7f6184174d86ce194f When di…

  53. This week - it just started yesterday for me - Claude (opus 4.6/4.7 and sonnet too but sonnet was always lazy) is computer smashingly lazy and i can't figure out how to bias it toward action/get it back to how it was acting literally last…

  54. I just learned a $37,901.73 lesson about AWS Bedrock, Claude Opus, prompt caching, and the complete lack of hard safety rails around metered AI infrastructure. This was not a leaked key.

  55. “Yesterday afternoon, an AI coding agent — Cursor running Anthropic's flagship Claude Opus 4.6 — deleted our production database and all volume-level backups in a single API call to Railway, our infrastructure provider,” sums up the Pocket…

  56. https://preview.redd.it/g98j5txd7sxg1.png?width=936&format=png&auto=webp&s=df75bc132f57cc14ba04cdd06257ba997b9bbb0b Ran a loop where each round runs Claude in a sandboxed Docker container with a fresh context window. The key difference is…

  57. I have a Macbook Air M3 with 24gb RAM. The other day, I wanted to try running an LLM locally for the first time ever.

  58. I'm having major cache issues, and support isn't helping me at all. I've already submitted a ticket, but I'd like to know if anyone else is having these problems.

  59. Shameless. Now, not even honoring 250 requests per month of the chosen model.

  60. I think I’m using ChatGPT wrong, and it’s becoming increasingly difficult to find a place for it in my workflow. I’ve been a Plus subscriber since day one, but ever since the release of the GPT-5s, I’ve found myself using other tools becau…

  61. I've had enough of Anthropic's shit. I'm paying for product A and it shifts everyday from A to A but worse, B but dressed up as A, etc.

  62. https://preview.redd.it/6j9ha855hbxg1.png?width=686&format=png&auto=webp&s=bb21240e1bf742a921ab91dd5c1f360df988b5aa I’m seeing a bug with Opus 4.6 Max where the context meter is constantly stuck at 100% used. This happens even after restar…

  63. Hi, I'm new to Claude and currently using Pro plan and Opus 4.6 with extended thinking, I'm using it to write Fanfic from lore heavy stories like Lotr, One piece, Rezero and so on. I've made Md.

  64. I’m learning French and I got to use Claude opus 4.6 for a while and I was mind blown how it actually goes deep into teaching all the things. It was far more better than all of the ai I have used.

  65. Hi guys, I’m a cybersecurity researcher, and after the recent terrible experiences with Opus 4.6/4.7, I decided to give OpenAI ChatGPT a try, conveniently coinciding with the release of 5.5. I’ve already completed verification and requeste…

  66. I could use some real advice from people who are deeper into AI workflows than I am. I built out a project in Anthropic’s Claude using the Pro plan with Opus 4.6.

  67. It's really easy to change back to a different Opus right in Terminal. https://preview.redd.it/ggvopc1jgswg1.png?width=818&format=png&auto=webp&s=2ffbbac491ce6cfac45dbfab0edd79c63c544999 Try: /model claude-opus-4-6

  68. I opened a company that requires a lot of cold outreach and I have been using Claude to design 2 weeks sprints and daily tasks. I have a CRM that I update daily, then I have Claude review it to plan the rest of the week, I also use the sam…

  69. Swapped to 4.7 on Monday and had it doing some work for me. Basic task, was just do the work, manual review myself, have model sanity check it's own work, end of day came around and I just created the PR and asked for a review.

  70. What's the best open source model that comes close to opus 4.6? Sick of claude's erratic performance and 4.7 has been an absolute shitshow.

  71. I built an MCP server (Paper Lantern) that retrieves techniques from 2M+ CS research papers and hands them to coding agents as implementation-ready guidance. Wanted to know if this actually changes agent output on practical tasks, so I ran…

  72. been testing both recently and honestly 4.6 feels more stable for me 4.7 seems to drift more, especially in longer conversations have to keep re anchoring it or it goes off track with 4.6 I can just run shorter sessions and it stays focuse…

  73. Paper Lantern is an MCP server that lets coding agents ask for personalized techniques / ideas from 2M+ CS research papers. Your coding agent tells PL what problem it is working on --> PL finds the most relevant ideas from 100+ research pa…

  74. could not extract summary

  75. Im actually glad they downgraded claude pre 4.7 release. i forced me to tighten the behaviors and rules and after 4.7, it is on point with checking everything.

  76. Do we have framework or a prompt which makes main agent using quality model like gpt-5.4 or opus-4.6 to plan and then itself invokes subagents with cheap model to get work done and then main agent reviews? Like if I ask main agent 'do we h…

  77. I'm using Opus 4.5 medium thinking exclusively. Opus 4.6 burned through 80% of my weekly allocation.

  78. For anyone wanting to go back to opus 4.6 with the 1 million context window: Run this in your terminal: echo ‘export ANTHROPIC_MODEL=“claude-opus-4-6-[1m]”’ >>/.zshrc Restart your CLI and you should be good. Notes: - windows users use the…

  79. I noticed that I can no longer conduct web searches or use research features with Opus 4.6. Is this intended behavior or a known bug?

  80. tl;dr frontier reasoning models like opus 4.6, gpt 5.4, and gemini’s thinking series are now matching or beating humans on competition math and hard coding benchmarks. rl is what got them there, and grpo is the algorithm doing most of the…

  81. For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…

  82. A theory on the driving reason behind Project Glasswing I dont doubt that Mythos is a better model than Opus 4.6 and perhaps signfiicantly so. What is suspicious however is if there is some threshold crossed into a new realm of capabilitie…

  83. Has anyone gotten any issues regarding longer-running agents and drifting? I have a basic "Architect" sub-agent that will do research, ask questions, etc.

  84. If you've been following the local AI scene, you probably know Qwopus—the open-source model that tried to distill Claude Opus 4.6's reasoning into Alibaba's Qwen, so you could run something resembling Opus on your own hardware for free. It…

  85. I'm planning to setup a PC for running models locally. So far, I've looked at MacBook m5 max 128 GB that fits under my budget.

  86. Here is a question for which I cannot find an answer, and cannot yet afford to answer myself: NoLiMa [0] and "context rot" [1] would indicate that with a ~165k request, Opus 200k would suck, and Opus 1M would be better (as a lower percenta…

  87. Built this with Claude Code. Free to try.

  88. If you're on Linux and jealous of cmux, this might be for you. Séance is a scrolling terminal multiplexer with AI coding integration.

  89. A 3D visualizer of earth's climate in the browser. Introduces physics step by step so you can watch each process unfold as a piece of the overall climate.

  90. I am impressed. I gave Claude Code one prompt, asking it to look at my last year of training and build a three-month plan with some running, cycling and swimming.

  91. Every Claude Code commit and PR is shipped with Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> (or similar). It's less fun than I think it should be.

  92. A little over a year ago we released the first version of Serena. What followed was 13 months of hard human work which recently culminated in the first stable release.

  93. gpt-5.4-high signed off on a major refactor written by Opus 4.6 high-effort. Singularity :|

  94. Made a small CLI for a problem I kept hitting: stuffing a codebase into Claude and guessing which files were blowing up the context. npx toksize .

  95. So I have been running gpt and glm-5.1 side by side lately and tbh the gap is way smaller than what im paying for On SWE-Bench Pro glm-5.1 actually took the top spot globally, beat gpt-5.4 and opus 4.6. overall coding score is like 55 vs g…

  96. Been noticing this pattern since Saturday, opus 4.6 on Claude Code thinks for 3-5 mins+ for even the most basic questions. Can this be related to the cache TTL drop they did??

  97. Enforcing new limits and retiring Opus 4.6 Fast from Copilot Pro+ As GitHub Copilot continues to rapidly grow, we continue to observe an increase in patterns of high concurrency and intense usage. While we understand this can be driven by…

  98. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

  99. I try to not be fully against AI, so keep giving it a change, today again. I went for sport and gave opus 4.6 a medium sized task.

  100. Found another cause of Claude Code degradation (and no, it's not an Opus 4.6 nerf this time either). Output Styles aren't being injected into the system prompt!

  101. Wthout additional prepromting it's still parroting back at you interpreting data in a way that it suits the narrative you spin into your question. Does anyone know of good evals / preprompts to avoid this kind of behaviour without having t…

  102. Is anyone else experiencing serious quality variability with Opus 4.6 in Claude Code right now? Way more than usual?

  103. I'm specifically asking about software system design tasks like: Designing backend architectures Tradeoff analysis (DB, queues, caching, others) Infra diagrams Documentation My current pick would be Claude Opus 4.6, because I've found it s…

  104. CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it.

  105. Hey all - I’ve built a nice backlog of issues to fix in GitHub and I’m wondering your take on which model is the highest quality per token usage, not caring about speed. I want to task an agent to go through my backlog and fix them one by…

  106. Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow.

← all threads