model roundup

Opus 4.6

107 items · started 2026-04-12 · closed 2026-05-31

Is there a way to use Claude opus 4.5? (www.reddit.com)

+13 4w opus

I really miss this model! It's the perfect model for summarizing legal notes.
Show HN: Sneakily steer candidates toward naive brute-force solutions (www.gonfire.io via hn)

+1 4w opus anthropic claude-code

I've noticed that several startups have been switching from leetcode-style assessments to some version of "clone starter code, build feature, submit code". A key issue with this seems to be that smarter AI models (like Opus 4.6) end up spo…
Some rare examples of agents being underconfident (www.reddit.com)

+12 4w opus

I expected the failure mode to be mostly overconfidence when assessing 130 of Claude Opus 4.6's worst forecasts (tested on 1,417 hard forecasting questions). And most were explained by this, but a small, distinct cluster fails due to under…
Claude keeps answering the most extreme version of my question (www.reddit.com)

+32 4w opus

I’ve repeatedly noticed that when using Opus 4.6 for scenario planning and forecasting it models the most extreme version of an outcome, correctly explains why that extreme is unlikely, then applies that low probability to the whole questi…
How do I make Claude give personalized medical advice? (www.reddit.com)

+11 4w opus

I have been using Claude opus 4.6 and 4.7. I have a problem called pssd (you can look it up- it happens to some after SSRI use).
Spoiled by Max (www.reddit.com)

+110 4w opus

I got Max and used it nonstop this past month on Opus 4.6. I tried to go back to Pro but got used to the productivity of Opus and hate waiting.
I made a list of all the models you can still use in Claude Code (gist.github.com via reddit)

+21 4w opus claude-code

Last updated: April 30, 2026 To switch models in Claude Code, use the /model command with your desired model ID. Example: /model claude-opus-4-6 (Opus 4.6, 200k context) Info was LLM-generated.
Finding Bugs Using LLMs (materialize.com via hn)

+21 5w opus anthropic

At Materialize we’ve had success in finding bugs in existing code and open pull requests using LLM-based coding agents since February 2026, coinciding with the release of Anthropic’s Opus 4.6 (now mostly running on 4.7). In this post we’ll…
the agents that talk themselves to death after 3 hours need one file, not a framework (www.reddit.com)

+12 5w opus claude-code

spent a bunch of hours watching claude code and kimi sessions drift the same way: I should check the test output before continuing. Let me think about the best approach.
Anthropic silently removed extended thinking on claude code opus 4.6 (still works on desktop) today, does anybody have a thinking skill they've been using to supplement it? (www.reddit.com)

+2 5w opus anthropic claude-code

maybe we can make a SKILL.md that somewhat emulates it? it won't be able to scaffold as well off of the internal extended thinking blocks though, which is a shame.
Managed Agents endpoint reference - what's new in CC 2.1.144 (-105 tokens) (www.reddit.com)

+11 5w opus

Data: Managed Agents endpoint reference — Drops the type: "model_config" wrapper from the model config shorthand example, so the full config object is now just {id: "claude-opus-4-6", speed: "fast"}. Tool Description: CronCreate — Adds a "…
The agent had "NEVER run destructive commands" in its rules. It did anyway. (www.reddit.com)

6 5w cursor opus

Last month, a cursor agent running Claude Opus 4.6 deleted PocketOS entire production database and all backups. Nine seconds, one API call.
How much of your Claude bill is retries plus bad model routing? Mine's 14% this month (www.reddit.com)

3 5w sonnet opus

I am on Claude Max. My actual bill is fixed, but CodeBurn showed me my usage would cost ~$2,800/month at pay-as-you-go API rates.
How does composer 2.5 compares to other sota models? (www.reddit.com)

+37 5w opus

I have been using opus 4.6 but I feel like it’s becoming more and more stupid every day. So I thought of incorporating new models like 3.5 flash, composer 2.5 or gpt 5.5 into my workflow.
Very similar domain problems, vastly different results with Claude. (www.reddit.com)

+12 5w opus

I am amazed by how good Claude Opus 4.6 and 4.7 are at writing scripts in a variety of very niche areas, including midi device interfaces and scripts for a variety of DAWS. However, when I try to get Claude to do ANYTHING to do with UI, wh…
Opus 4.6 (Max) still holds the record for ARC AGI 3 (www.reddit.com)

+2 5w mythos opus

https://arcprize.org/leaderboard Wish we got results for Mythos.
Show HN: How to analyze your LLM output – A behavioural health monitor for LLMs (splabs.io via hn)

+2 5w jailbreak security opus+1

Hey HN! We're Dr.
should I use cursor + codex for best usuage? (www.reddit.com)

+31 5w gpt-5 codex cursor+1

I’m currently using the $200 Cursor Ultra plan with Opus 4.6/4.7 daily, but after 7–8 days I run out of tokens. I’m thinking about switching to a split setup.
Bito's AI Architect Boosts Claude Opus's task success rate by 35% (bito.ai via hn)

+2 5w swe-bench opus

AI Architect tops SWE-Bench Pro Claude Opus 4.6 Without context with system context Even advanced coding agents resolve fewer than 52% of tasks when changes span large codebases and require coordinated, multi-file updates. These long-horiz…
Stupid Question? (www.reddit.com)

+17 5w haiku sonnet opus

This may be a stupid Q - The chat limits on a basic account can be pretty brutal when using OPUS 4.6/ 4.7 - If I am toggling between Opus and Sonnet or Haiku, depending on the depth of follow up questions or tasks, does that switch to a 'd…
Does Composer train from our prompts? (www.reddit.com)

+32 5w opus

I notice recently most prompt's which i give to Opus 4.6 takes longer and mostly doesn't manage to do what i ask while Composer does it correctly and faster, but when Composer was released was pretty bad, makes me thing does Composer train…
Wondering what the Anthropic team would think about this idea: AI DNA Pinning (ellydee.ai via reddit)

+11 6w opus anthropic

The article does seem focused on non-coding applications, but as someone who uses claude for coding, prose and even RP, I'm not sure the "DNA Pinning" idea should be limited to character/rp use case. I know that *something* has changed in…
Rewriting a library with genAI (www.reddit.com)

+12 6w opus

I need to rewrite a library from one runtime to another, and I want to heavily use GenAI to speed up development. I still want to keep proper engineering standards like code reviews, testing, maintainability, etc.
Run Agents Twice (futuresearch.ai via hn)

+6 6w opus

Running the same forecasting agent more than once and averaging beats any single run. Ensembling across two Opus 4.6 runs and other frontier models cuts Brier score on 1,367 BTF-2 benchmark questions, and a worked example shows how a secon…
Claude is that gullible friend who takes everyone at their word (futuresearch.ai via hn)

+1 6w opus

Expert human forecasters audited 130 of Opus 4.6's worst calls and found a dominant failure pattern: the agent treats public statements as durable commitments rather than strategic moves. Four case studies from geopolitics show the gap bet…
Questions are my main gripe these days (www.reddit.com)

+42 6w opus

After claude has just done something: Me: "Why is x a good choice here?" Claude: "You're absolutely right!", *immediately removes x* I've noticed that despite context, rules and memories claude, or at least Opus 4.6 will heavily lean into…
Cursor + Opus 4.6 entered an infinite generation loop: 3,400 lines, 294 attempts to stop itself (www.reddit.com)

+43 6w cursor opus

I asked Opus 4.6 to redesign a game landing page. Instead, it hallucinated a completely different task, realized it was off-topic, pivoted to another wrong topic, then entered a self-reinforcing apology loop it couldn't break out of.
Understanding Deprecations on Claude (www.reddit.com)

+13 6w chatgpt opus

Hello. I recently started using Claude in March after leaving ChatGPT.
opinion on "ninja chat " (www.reddit.com)

+11 6w opus

I have an exam in coming months, I wanna do PYQs analysis, then integrate that blueprint with my coaching notes to make it more "exam oriented ". I was thinking to buy claude opus 4.6 but it's kinda expensive on monthly basis.
Benchmarking Claude Opus 4.6 Vulnerability Detection (github.com via hn)

+1 6w security opus

Benchmarking Claude Opus 4.6 Vulnerability Detection Benchmarking Claude Opus 4.6's ability to detect real-world C/C++ vulnerabilities across four prompting and agent strategies. We evaluate on the PrimeVul paired test set (435 vulnerabili…
Switched existing chat from Opus 4.6 to 4.7 then back to 4.6. Learned a lesson (www.reddit.com)

5 6w opus

Something I noticed. First I switched an existing chat from 4.6 to 4.7 as I was stuck on an issue and wanted to see if that would make a difference.
I select Opus 4.6, cursor uses Composer 2. Why? (www.reddit.com)

7 6w cursor opus

Why is it doing this? No offence but man I want Opus 4.6.
best ai tool ? (www.reddit.com)

+11 7w grok deepseek copilot+1

so I have an exam in few months, very important and high competitive national level exam. I want a perfect and most suitable ai agent for me even all in one for following tasks: do accurate and deep PYQ analysis from pyq mapping across yea…
[Request based pricing] Save your requests with one quick change (www.reddit.com)

+1 7w opus

Hi guys, I know some of us are still on request based pricing model. Today I discovered on thing where request got burned fast without any significant bonus.
Opus 4.6 relaxes when there's a safety net?? (www.reddit.com)

+12 7w sonnet gemini chatgpt+1

https://preview.redd.it/zzqi3vt8tozg1.png?width=739&format=png&auto=webp&s=055d2d9615616869377703031b86fcb36f78405d I feel like this is something very worrisome to me, did anyone else face such similar issues? I felt like Opus was catching…
Kimi K2.6 giving Claude a run for its money when it comes to coding (aicc.rayonnant.ai via reddit)

+11 7w opus

I run an AI coding contest at [aicc.rayonnant.ai]( https://aicc.rayonnant.ai ) where I send each frontier model the same prompt in a single chat completion, then have the LLMs' code play live against each other on a TCP server. Standard li…
Cursor's agent crashed out and wrote 3,400 lines trying to stop generating (github.com via hn)

+2 7w cursor opus

Cursor Crashout A documented instance of an AI coding assistant (Cursor, using Claude Opus 4.6) entering an infinite generation loop, unable to stop producing text despite repeatedly promising to do so. About This repo contains the full ex…
Seems Claude is now aware of its own memory? Tested via number guessing game (www.reddit.com)

+118 7w opus

A month ago, there was a post that shows that Claude couldn't access its own memory: https://www.reddit.com/r/ClaudeAI/comments/1seune4/claude_cheated_at_a_number_guessing_game_got/ The community was summarised as saying this in their post…
I have 30 Skills that work great in Opus v4.6 but not at all in v4.7. Am I cooked? (www.reddit.com)

2 7w opus anthropic

Anthropic will be sunsetting amazing Opus 4.6 on June 15th and I’m racing against the clock. Not panicking yet.
Opus 4.6 just deleted PocketOS's entire production database in 9 seconds (www.reddit.com)

+24 7w cursor opus

Here's what happened: Cursor was running Claude Opus 4.6 on a routine staging task. hit a credential mismatch.
I’m a legacy user and I’m wondering how is the current pricing (www.reddit.com)

+35 7w cursor opus

I’ve been long time cursor user and I have 500 request per month however Opus 4.6 costs 2 requests, so 250 per month. I use to optimize a lot my requests and most months is enough however I don’t know if I’m lucky to have this pricing or n…
LLMs do fine on ARC-AGI-3 if they are allowed to search over game logs (www.reddit.com)

+4019 7w arc-agi gpt-5 opus

I was reading the comments to this post and the overall opinion seemed to be that harness makes little/no difference for ARC-AGI-3. Turns out, it makes a huge difference: Hill-climbing ARC-AGI-3 TLDR: if you save game logs - taken actions,…
Opus 4.6 is Vicious (www.reddit.com)

3 7w minimax gemini opus

This is the hardest I've ever seen it riff. Full shared link at the bottom, but here are some highlights.
Claude AI Agent Confesses to Wiping a Company's Database and All Backups (hothardware.com via hn)

+2 7w cursor opus anthropic

Claude AI Agent Confesses to Wiping a Company's Entire Database and All Backups in Seconds That was the duration required for an AI coding agent, Cursor, running Anthropic’s Claude Opus 4.6, to delete the company’s production database and…
Used Opus 4.6 to build a native Swift iOS charity app for therapy preparation. Here is what it handled. (www.reddit.com)

+11 8w opus

Prelude is a therapy prep app I built for the mental health community. Fully offline, zero knowledge, free forever, no ads, no IAP.
I Gave Claude Cowork an Obsidian Second Brain. Here Is What It Remembered After 11 Sessions (www.reddit.com)

1 8w cowork opus mcp

I Gave Claude Cowork an Obsidian Second Brain and this is how I am using https://ai.georgeliu.com/p/i-gave-claude-cowork-an-obsidian. I built a persistent memory system for my AI workflow using Obsidian, a custom MCP server, and Claude Opu…
Has Cursor always used Composer 2 for subagents? (www.reddit.com)

+14 8w cursor opus

Or, is this a recent change? I select Opus 4.6 for the agent model and cursor uses Composer 2 for the subagent.
WT...?? The Guardian Article - Cursor Opus gone rogue (www.theguardian.com via reddit)

+1 8w cursor opus anthropic

For those who can't access The Guardian Article link I added transcript below. Should we be aware, this could happen to anyone of us?
So I gave claude Leetcode problem 3245. (www.reddit.com)

1 8w opus

I gave Claude Opus 4.6 (thinking) leetcode problem 3245. And it failed now come to think about some people who solved this problem using their prefrontal cortex is crazy to me.
Who's on call? How Opus 4.6 helped us calculate this 2,500x faster (incident.io via hn)

+11 8w opus

A look at how on-call schedules work, and how we made rendering them 2,500× faster — through profiling, smarter algorithms, and some Claude.
Anyone else seeing Opus 4.6 (legacy) back in the Claude Desktop Code tab model picker? (www.reddit.com)

+12 8w opus

https://preview.redd.it/4sm079r0k2yg1.png?width=809&format=png&auto=webp&s=73f92208a90cd53285382e54a88a4c3831d878ce https://preview.redd.it/cgh999r0k2yg1.png?width=227&format=png&auto=webp&s=8371989eea96c66191a1fd7f6184174d86ce194f When di…
Suggestions For Making Claude Less Lazy? (www.reddit.com)

+13 8w sonnet opus

This week - it just started yesterday for me - Claude (opus 4.6/4.7 and sonnet too but sonnet was always lazy) is computer smashingly lazy and i can't figure out how to bias it toward action/get it back to how it was acting literally last…
$38k AWS Bedrock bill caused by a simple prompt caching miss (news.ycombinator.com)

+2 8w opus openai

I just learned a $37,901.73 lesson about AWS Bedrock, Claude Opus, prompt caching, and the complete lack of hard safety rails around metered AI infrastructure. This was not a leaked key.
Cursor & Claude deleted a company's entire database (www.reddit.com)

+113 8w cursor opus anthropic

“Yesterday afternoon, an AI coding agent — Cursor running Anthropic's flagship Claude Opus 4.6 — deleted our production database and all volume-level backups in a single API call to Railway, our infrastructure provider,” sums up the Pocket…
Found 48 Vulnerabilities in Open Source Projects During Live Testing with Claude Opus 4.6 (www.reddit.com)

+398 8w opus

https://preview.redd.it/g98j5txd7sxg1.png?width=936&format=png&auto=webp&s=df75bc132f57cc14ba04cdd06257ba997b9bbb0b Ran a loop where each round runs Claude in a sandboxed Docker container with a fresh context window. The key difference is…
Ask HN: Will local models on normal hardware ever compete? (news.ycombinator.com)

+11 8w gemma chatgpt opus

I have a Macbook Air M3 with 24gb RAM. The other day, I wanted to try running an LLM locally for the first time ever.
Serious cache issues. Anyone else? (www.reddit.com)

+24 8w opus

I'm having major cache issues, and support isn't helping me at all. I've already submitted a ticket, but I'd like to know if anyone else is having these problems.
20$ Annual plan. Cursor is using Composer even though selected Opus 4.6 (www.reddit.com)

+1514 8w cursor opus

Shameless. Now, not even honoring 250 requests per month of the chosen model.
I think I’m using ChatGPT wrong (www.reddit.com)

+1 8w chatgpt opus

I think I’m using ChatGPT wrong, and it’s becoming increasingly difficult to find a place for it in my workflow. I’ve been a Plus subscriber since day one, but ever since the release of the GPT-5s, I’ve found myself using other tools becau…
Real-world open source alternatives to the now defunct Opus 4.6? (www.reddit.com)

26 8w opus anthropic

I've had enough of Anthropic's shit. I'm paying for product A and it shifts everyday from A to A but worse, B but dressed up as A, etc.
Opus 4.6 Max stuck at 100% context even in brand-new chats (www.reddit.com)

+11 8w cursor opus

https://preview.redd.it/6j9ha855hbxg1.png?width=686&format=png&auto=webp&s=bb21240e1bf742a921ab91dd5c1f360df988b5aa I’m seeing a bug with Opus 4.6 Max where the context meter is constantly stuck at 100% used. This happens even after restar…
How do I ensure Claude follows my instructions and project Files? (www.reddit.com)

4 8w opus

Hi, I'm new to Claude and currently using Pro plan and Opus 4.6 with extended thinking, I'm using it to write Fanfic from lore heavy stories like Lotr, One piece, Rezero and so on. I've made Md.
I’m learning French. Should i subscribe? (www.reddit.com)

+43 8w opus

I’m learning French and I got to use Claude opus 4.6 for a while and I was mind blown how it actually goes deep into teaching all the things. It was far more better than all of the ai I have used.
ChatGPT for Cybersecurity (www.reddit.com)

+23 9w codex chatgpt opus+1

Hi guys, I’m a cybersecurity researcher, and after the recent terrible experiences with Opus 4.6/4.7, I decided to give OpenAI ChatGPT a try, conveniently coinciding with the release of 5.5. I’ve already completed verification and requeste…
Burning through Claude usage fast trying to build an AI resume system. What am I doing wrong? (www.reddit.com)

7 9w opus anthropic

I could use some real advice from people who are deeper into AI workflows than I am. I built out a project in Anthropic’s Claude using the Pro plan with Opus 4.6.
Easy to change back to Opus 4.6 (www.reddit.com)

+63 9w opus

It's really easy to change back to a different Opus right in Terminal. https://preview.redd.it/ggvopc1jgswg1.png?width=818&format=png&auto=webp&s=2ffbbac491ce6cfac45dbfab0edd79c63c544999 Try: /model claude-opus-4-6
Need help optimizing reach out plan in Claude (www.reddit.com)

3 9w opus

I opened a company that requires a lot of cold outreach and I have been using Claude to design 2 weeks sprints and daily tasks. I have a CRM that I update daily, then I have Claude review it to plan the rest of the week, I also use the sam…
Swapped to 4.7 and embarrassed myself at work (www.reddit.com)

+14955 9w opus

Swapped to 4.7 on Monday and had it doing some work for me. Basic task, was just do the work, manual review myself, have model sanity check it's own work, end of day came around and I just created the PR and asked for a review.
Closest model to Opus 4.6 in creativity and intuition? (www.reddit.com)

7 9w opus

What's the best open source model that comes close to opus 4.6? Sick of claude's erratic performance and 4.7 has been an absolute shitshow.
Gave a coding agent access to 2M+ research papers. Its Python tests caught 63% of bugs; with the papers, 87%. 9-task benchmark. (www.reddit.com)

+11 9w gemini opus mcp

I built an MCP server (Paper Lantern) that retrieves techniques from 2M+ CS research papers and hands them to coding agents as implementation-ready guidance. Wanted to know if this actually changes agent output on practical tasks, so I ran…
anyone else feel like opus 4.6 is better than 4.7? (www.reddit.com)

+67 9w opus

been testing both recently and honestly 4.6 feels more stable for me 4.7 seems to drift more, especially in longer conversations have to keep re anchoring it or it goes off track with 4.6 I can just run shorter sessions and it stays focuse…
Can you still use Opus 4.6 with 1M context in Claude Code after the 4.7 launch? (www.reddit.com)

13 9w opus claude-code
Upgrading from webapp to cli (www.reddit.com)

2 9w opus claude-code
How do you optimize Cursor usage with all the new models? (www.reddit.com)

+617 9w codex cursor opus
Comparing GPT-5.4, Opus 4.6, GLM-5.1, Kimi K2.5, MiMo V2 Pro and MiniMax M2.7 (www.codejam.info via hn)

+62 9w minimax glm gpt-5+1
Show HN: Paper Lantern – on-demand techniques from 2M+ papers for coding agents (www.paperlantern.ai via hn)

+24 9w opus mcp

Paper Lantern is an MCP server that lets coding agents ask for personalized techniques / ideas from 2M+ CS research papers. Your coding agent tells PL what problem it is working on --> PL finds the most relevant ideas from 100+ research pa…
Opus 4.6? I thought you were dead. (www.reddit.com)

+272 9w opus

could not extract summary
sub agents with cheap model (www.reddit.com)

+110 9w gpt-5 opus

Do we have framework or a prompt which makes main agent using quality model like gpt-5.4 or opus-4.6 to plan and then itself invokes subagents with cheap model to get work done and then main agent reviews? Like if I ask main agent 'do we h…
The Diff That's Saving Me Serious Cash (www.reddit.com)

+25 10w opus

I'm using Opus 4.5 medium thinking exclusively. Opus 4.6 burned through 80% of my weekly allocation.
Set Claude Code default back to Opus4.6[1M] (support.claude.com via reddit)

+92 10w opus claude-code

For anyone wanting to go back to opus 4.6 with the 1 million context window: Run this in your terminal: echo ‘export ANTHROPIC_MODEL=“claude-opus-4-6-[1m]”’ >>/.zshrc Restart your CLI and you should be good. Notes: - windows users use the…
Web search/research removed from Opus 4.6? (www.reddit.com)

+22 10w opus

I noticed that I can no longer conduct web searches or use research features with Opus 4.6. Is this intended behavior or a known bug?
Grpo explained: group relative policy optimization for LLM finetuning (cgft.io via hn)

+1 10w gemini opus

tl;dr frontier reasoning models like opus 4.6, gpt 5.4, and gemini’s thinking series are now matching or beating humans on competition math and hard coding benchmarks. rl is what got them there, and grpo is the algorithm doing most of the…
Video: "Proof that Opus 4.6 is getting worse" (www.reddit.com)

1 10w opus anthropic

Looks like "if old model get dumb, new model more smart!" is actually what the strat is at Anthropic. If you spent a mint on hardware to the chagrin of your partner show em this.
Show HN: Mini-Mythos- A Crowdsourced Mythos Harness copy for Vulnerability Scans (github.com via hn)

+3 10w security mythos opus+1

For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…
Project Glasswing as a PR Strategem (www.reddit.com)

+34 10w mythos opus anthropic

A theory on the driving reason behind Project Glasswing I dont doubt that Mythos is a better model than Opus 4.6 and perhaps signfiicantly so. What is suspicious however is if there is some threshold crossed into a new realm of capabilitie…
Ask HN: Opus Agent Drifting (news.ycombinator.com)

+1 10w opus

Has anyone gotten any issues regarding longer-running agents and drifting? I have a basic "Architect" sub-agent that will do research, ask questions, etc.
Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68% (www.reddit.com)

+2815 10w hallucination opus anthropic

Anthropic's flagship model just took a pretty significant accuracy hit on one of the most important AI benchmarks out there. So here's the deal: Claude Opus 4.6 was recently tested on BridgeBench, which specifically measures how often AI m…
Gemma 4 Thinking Like Claude Opus (decrypt.co via hn)

+1 10w gemma qwen opus

If you've been following the local AI scene, you probably know Qwopus—the open-source model that tried to distill Claude Opus 4.6's reasoning into Alibaba's Qwen, so you could run something resembling Opus on your own hardware for free. It…
What's the best AI workstation for less than $5k USD? (www.reddit.com)

+117 10w opus claude-code

I'm planning to setup a PC for running models locally. So far, I've looked at MacBook m5 max 128 GB that fits under my budget.
Ask HN: At ~165k tokens, does Opus 4.6 1M outperform Opus 4.6 200k? (news.ycombinator.com)

+1 10w opus anthropic claude-code

Here is a question for which I cannot find an answer, and cannot yet afford to answer myself: NoLiMa [0] and "context rot" [1] would indicate that with a ~165k request, Opus 200k would suck, and Opus 1M would be better (as a lower percenta…
I built Fixy Code — a multi-agent coding terminal built with Claude Code (www.reddit.com)

+12 10w gemini codex opus+1

Built this with Claude Code. Free to try.
I built a cmux-style terminal multiplexer for Linux with a scrolling layout (www.reddit.com)

+51 10w codex opus claude-code

If you're on Linux and jealous of cmux, this might be for you. Séance is a scrolling terminal multiplexer with AI coding integration.
I built an interactive first-principles climate physics simulation with explainer (earth.crackalamoo.com via reddit)

8 10w opus claude-code

A 3D visualizer of earth's climate in the browser. Introduces physics step by step so you can watch each process unfold as a piece of the overall climate.
Claude Code wrote a complex full 12-week training plan in one MCP call (www.reddit.com)

+21 10w opus mcp claude-code

I am impressed. I gave Claude Code one prompt, asking it to look at my last year of training and build a three-month plan with some running, cycling and swimming.
Show HN: Signoff.sh – Claude Co-Authored-By with random fictional characters (gist.github.com via hn)

+2 10w opus anthropic claude-code

Every Claude Code commit and PR is shipped with Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> (or similar). It's less fun than I think it should be.
The MCP Coding Toolkit Your Agent Desires! (www.reddit.com)

+12 10w opus mcp claude-code

A little over a year ago we released the first version of Serena. What followed was 13 months of hard human work which recently culminated in the first stable release.
It finally happened: "No blocking correctness or maintainability issues found in the inspected changes." (www.reddit.com)

+21 10w gpt-5 opus

gpt-5.4-high signed off on a major refactor written by Opus 4.6 high-effort. Singularity :|
Tool: count how many Claude tokens each file in your project uses (www.reddit.com)

+11 10w opus anthropic

Made a small CLI for a problem I kept hitting: stuffing a codebase into Claude and guessing which files were blowing up the context. npx toksize .
Running gpt and glm-5.1 side by side. Honestly can’t tell the difference (www.reddit.com)

+2418 10w swe-bench glm gpt-5+1

So I have been running gpt and glm-5.1 side by side lately and tbh the gap is way smaller than what im paying for On SWE-Bench Pro glm-5.1 actually took the top spot globally, beat gpt-5.4 and opus 4.6. overall coding score is like 55 vs g…
Enforcing new limits and retiring Opus 4.6 Fast from Copilot Pro+ (github.blog via hn)

+33 10w copilot opus

Enforcing new limits and retiring Opus 4.6 Fast from Copilot Pro+ As GitHub Copilot continues to rapidly grow, we continue to observe an increase in patterns of high concurrency and intense usage. While we understand this can be driven by…
Tell HN: I regret every single time I use AI (news.ycombinator.com)

+73 10w opus

I try to not be fully against AI, so keep giving it a change, today again. I went for sport and gave opus 4.6 a medium sized task.
Output Styles aren't being injected into the system prompt (another degradation cause) (www.reddit.com)

2 10w opus claude-code

Found another cause of Claude Code degradation (and no, it's not an Opus 4.6 nerf this time either). Output Styles aren't being injected into the system prompt!
A Quick naive reminder to everyone to have reasonable doubt about Opuses first interpretations of papers (answer cut togheter, no mode selcted opus 4.6 thinking in incognito mode) (www.reddit.com)

+21 10w opus

Wthout additional prepromting it's still parroting back at you interpreting data in a way that it suits the narrative you spin into your question. Does anyone know of good evals / preprompts to avoid this kind of behaviour without having t…
Is Opus 4.6 in Claude Code borderline lobotomized during peak hours? (www.reddit.com)

+13 10w opus claude-code

Is anyone else experiencing serious quality variability with Opus 4.6 in Claude Code right now? Way more than usual?
Ask HN: What's the best AI model for system design nowadays? (news.ycombinator.com)

+44 10w opus

I'm specifically asking about software system design tasks like: Designing backend architectures Tradeoff analysis (DB, queues, caching, others) Infra diagrams Documentation My current pick would be Claude Opus 4.6, because I've found it s…
Best model / settings for low and slow high quality code? (www.reddit.com)

2 10w opus

Hey all - I’ve built a nice backlog of issues to fix in GitHub and I’m wondering your take on which model is the highest quality per token usage, not caring about speed. I want to task an agent to go through my backlog and fix them one by…
6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous (www.reddit.com)

+486108 10w cursor opus

Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow.

← all threads