model roundup

Sonnet 4.6

130 items · started 2026-04-12 · closed 2026-05-30

  1. I just released a new benchmark called The Singularity Gate. Tests whether frontier AI can predict paradigm-breaking scientific discoveries published after their training cutoff.

  2. So I just recently bought Claude Pro to help me write and code my thesis, but am getting stuck in the beginning, since I don't know how to properly set up Claude's workflow (Projects, artifacts, skills, etc.). I use python in VS Code to an…

  3. I have never changed any setting in the cursor, by default selected the composer 2.5 fast neither my prompt had anything mentioned as the sonnet Still cursor decided to spawn the sonnet subagent and consume my API cost ! :( I have a markdo…

  4. I just wanted to know what kind of interesting workflows have you guys tried using the Sub Agents feature in Claude/Codex/etc~ For me, I tend to only minimize my main agent's context window usage to prevent context rot by deploying sub age…

  5. I use Sonnet 4.5, Opus 4.6 and Opus 4.7 for different usecases - but my main across all 3 usecases was Sonnet 4.5 as I felt it was great for everything I needed and affordable. Sonnet 4.6...

  6. could not extract summary

  7. I just released a new benchmark called The Singularity Gate. Tests whether frontier AI can predict paradigm-breaking scientific discoveries published after their training cutoff.

  8. I compared ChatGPT (Plus - Auto), Claude (Pro - Sonnet 4.6) and Gemini (Pro - Flash) over 90 minutes, mostly Q&A about mobile phones, asked to research specs, reviews, pros and cons, create executive summaries with the results, etc., nothi…

  9. I want to study topics in depth and in easy language , which model is best for me ?. Is there much difference in sonnet 4.6 and opus 4.6 in easy and detail explanation or they r the same ?

  10. Been building with Codex (Gpt 5.5), Sonnet 4.6, recently tried Gemini 3.1 pro. While Codex and Claude are kind of on-par in terms of the quality of the work, I found Gemini 3.1 Pro to be like an inexperienced, junior SWE who turns in half-…

  11. I was brainstorming about a video with Claude (Sonnet 4.6). It suggested to explain the difference among ChatGPT, Gemini, Claude and DeepSeek.

  12. Been building a browser-automation layer for AI agents (think: sign up for SaaS, fill forms, pull OTPs, click verification links). The default playbook is the browser-use / Stagehand pattern: hand the LLM the page, let it pick the next act…

  13. Gemma 4: A new, budget-focused model in Posit AI Gemma 4 is now available in Posit Assistant via the Posit AI provider. It's priced at a tenth of the price of Claude Sonnet 4.6 and less than a third of the price of our current cheapest off…

  14. For reference, I had sonnet build an API inside an LXC container using claude code cli (also that api key will most certainly be rotated, don’t worry)

  15. I've been using Sonnet 4.6. Over the last couple months I've noticed that a lot of the answers I get from Claude about personal topics are worded in a condescending way.

  16. Wanted to share a result I didn't expect to work. Running google/gemma-4-e2b locally through LM Studio, exposed via OpenAI-compatible endpoint, called from a Spring Boot app using Spring AI's ChatClient abstraction.

  17. There are a handful of developer tools I use almost every day, and over time I realized I was constantly relying on random websites while basically trusting them not to store, inspect, or share whatever data I pasted into them. I looked at…

  18. Checked April token usage for our AI stack. Input/output ratio was roughly 125:1.

  19. I uploaded a Claude.MD file to the free Sonnet 4.6 model, which is intended to create a medium-sized app. The progress log shows that a lot has been completed and numerous files have been created.

  20. I gave the tasks to my agent running on gemma4 26b via openclaw on llamacpp to research products that fulfill my need. It was a rather long description of the use case, of what I don't want and so on.

  21. DeepSeek just popped the American AI bubble. Not by killing AI.

  22. https://preview.redd.it/730lz3ghov2h1.png?width=2080&format=png&auto=webp&s=6840364fbb89926687dfef737a736bad8327ab65 https://preview.redd.it/gkluwephov2h1.png?width=752&format=png&auto=webp&s=6a300426b132e6cc0fd2e41e167b0bf4cd5d7885 Mac OS…

  23. I am using Claude Sonnet 4.6 to write a python script for an nlp sentimental analysis. I did not tell it to create all of the code and send it my way, but let's create together step by step so I can test each line before making it into the…

  24. Not benchmarks — actual tasks, actual results. Claude Sonnet 4.6 for: - Long documents that need nuanced analysis - Writing where voice and precision matter - Reasoning through edge cases in code - Anything where "think carefully" is the r…

  25. Im making a scientific paper not in my native language and i want to feed claude all my bibliography and past stuff ive written so it can make me a paper, is sonnet 4.6 good enough??

  26. Hi, guys! I'm new here, and I wanted to discuss with people about the concerns regarding implementation of AI in sensitive matters, such as war, and battlefield.

  27. I want to get others opinion about this approach. I am on the $20 Pro plan and like a lot of others, I find that the limits are not enough for what I want to do, but of course I am always hesitant to move to the next paid tier cause it is…

  28. As of 9 a.m. ET on May 21, Claude Opus 4.6 from Anthropic is the top performing AI model among all professionals, according to a new ranking from Crosscheck by LinkedIn Labs.

  29. Update: Sonnet 4.5 will no longer be available for chat starting May 26. You'll continue on Sonnet 4.6 instead.

  30. Hey, I have a few Claude questions and I’m hoping someone here knows what’s going on. - Is Sonnet 4.5 actually being removed?

  31. I've been running Claude Pro (Opus 4.7 / Sonnet 4.6) for about 3 weeks on a complex personal AI infrastructure project. I keep structured session logs with timestamps and Birkenbihl-style metacognitive fields after every session.

  32. I’m curious to see what everyone is using for which cursor mode and if anyone thinks composer 2.5 can take the place of any of the models I’m currently using: Ask: usually Sonnet 4.6, sometimes GPT 5.5 Plan: Opus 4.7 Build: GPT 5.5

  33. HalBench Results: TL;DR: I built HalBench, an open benchmark for LLM sycophancy and hallucination. 3,200 false-premise prompts × 4 models = 12,800 graded responses.

  34. Is there supposed to be a difference in the quality of the response Claude Pro subscribers get vs Claude Free users, using the same models? (Using either the app or logged in via browser.) Example: Under Claude Pro using Sonnet 4.6, it rem…

  35. This is really just a post for those with shallow understanding of all this stuff, those not yet ready or capable of diving into the deeper end of vibe coding/llms. It might not be a helpful post for anyone more advanced than that.

  36. I've made the switch from Gemini to Claude mostly for business strategy, writing, etc. I use Opus 4.7 on occasion for strategy and otherwise Sonnet 4.6 for everything else.

  37. So, it seems there is still a long way to go in terms of alignment - at least for small models. Maybe the correlation between intelligence/education and peace is not only a human phenomenon.

  38. I've been using other AI models since Claude wasn't available in my country. Recently, It has become available and today I started using the Sonnet 4.6 model.

  39. Is Sonnet 4.6 just better at explaining concepts compared to Opus 4.6 and 4.7 or am I the only one feeling that way ??

  40. I’ve been paying $40 a month since January to run Claude Pro and ChatGPT Plus head-to-head. Tracked every single task.

  41. prompting nerd here, small thing that compounds. negation prompting works way worse than people think.

  42. I’m on the free tier, iOS. A few days ago I updated the Claude chat app but didn’t use it.

  43. Hi! it's my first project with bubble tea and lipgloss.

  44. I am a mainly recreational user - no use for work job / intensive college study / or big projects related to work/study My main uses relate to some self led medical research and a random mix of whatever else. I am on the free version and u…

  45. Hi so I've never used AI before to create a site but last week I was asked by my sis to create one for her small business so I thought why not try Claude. £18 paid we now have a fairly decent looking site running on vercel using nextjs and…

  46. sorry if this is not the correct flair but i've been using sonnet 4.5 for months, mostly for fanfics and personal stories and honestly its the best model i ever used since i switched from gemini and chatgpt but now within few hours, i will…

  47. For anyone who disable adaptive thinking in Claude Code to maintain its quality levels, Anthropic is deprecating this toggle and will force adaptive thinking to be the default. This change will affect legacy models such as Opus 4.6 and Son…

  48. Quite odd, there were issues today with Sonnet 4.6 (according to the status page) but they should have been resolved. Yet i still get the following error while running auto-mode: ● Bash(for cls in "topbar" "dump-card" "settings-panel" "bul…

  49. I noticed that the German quotation marks bug in Claude is still not fixed in Opus 4.7 and Sonnet 4.6 (the problem exists at least from Opus 4.0 / Sonnet 4.0: Translate to German: He said: "This is imporant." Er sagte: „Das ist wichtig." B…

  50. https://preview.redd.it/jn3vue1zuo0h1.png?width=904&format=png&auto=webp&s=c2ea79ea0c1384d94f90a6ec3435866331c249f1 I was about to run a piece of code I don't know much about, but did a double check and questioned the main premise for it's…

  51. I run a construction company and I am trying to build real AI agent workflows for business operations, not just demos. I spent time testing Hermes and OpenClaw, but both became too fragile for my use case.

  52. I noticed the core pillars are: Helpful, Honest, Harmless and User Autonomy. However, Sonnet 4.6 I noticed follows the same output in conversation at the very first sight of emotions.

  53. I was working my project (free plan, sonnet 4.6 adaptive) and hit the limit EXACTLY as I was done working with it. I love this chatbot.

  54. I have been using both, since last week, it has been an extremely painful experience. It blatantly ignores the prompt and does whatever it likes; I am surprised that it can't even follow basic instructions.

  55. Per my experience, Opus 4.7 is so slow, Sonnet 4.6 is ok. I am also using local models wondering if Claude is already leveraging drafters/assistant AIs and despite that so slow or not?

  56. Yesterday sonnet 4.6 adaptive thinking seems responding too fast and making simple mistakes that has not surfaced since the recent rectify of the adaptive thinking introduction. The photos show the most glaring mistake it made.

  57. I was having some real trouble getting my new controller, with those extra (small) bumpers and triggers underneath, to work properly in Rocket League. Spent hours but it just didn't want to work properly.

  58. We can all agree that the new Qwen models are truly amazing, and we are blessed to have them. In coding, they are certainly a breakthrough.

  59. I first experienced it last night and it keeps going. The doc I'm attaching is 10K tokens, well under the limit.

  60. I'm building a tax software, it uses ASP.NET(API) and Web Blazor(UI), i'm using Visual Studio for both. At the moment, i just paste the files in the projects into Claude AI Chat, asking what i should do, and then, when everything is ok, i'…

  61. The behavior is: prompt sent, chat starts, Claude starts writing the answer. After 2-3 sentences, it cuts, resets, and sends me back to the initial project chat message with no answer recorded and 7% of my tokens burned.

  62. Every time I try to get Claude Code to make a change to a Kotlin/Compose UI I get the same error, "API Error: Output blocked by content filtering policy". I'm trying to have it change some small Kotlin/Compose UI to have 2 columns, and put…

  63. I’ve been a longtime ChatGPT Plus subscriber, but I want to switch to Claude long-term. I got Claude Pro so I could compare them both over a month.

  64. I opened the Claude iOS app and asked claude-sonnet-4.6 a simple question about cycling routes. What I got back was...

  65. I have one simple request: If I select Sonnet 4.6, stop auto-launching that crappy Composer 2 as a subagent. It’s dog-slow and, frankly, an idiot.

  66. We run a small content-monitoring agent for our growth team. Nothing fancy on paper.

  67. Asked Sonnet 4.6 High to analyze my CC usage across all sessions and get an accurate cost estimate if I used the API. This is what it came back with.

  68. Hey, sysadmin here thinking about paying for a premium AI subscription and can't decide between Claude Pro, ChatGPT Plus and Perplexity Pro. Two things I can't find a clear answer to: Which one would you recommend for a sysadmin/network te…

  69. About six months back I wrote up three prompt codes that change Claude's behavior when you put them at the start of a message: L99 for hard architectural decisions, OODA for time-pressured calls, ARTIFACTS for multi-output tasks. They work…

  70. Dust3D 1.0 is finally released — about 10 years after the first commit in December 2016. I posted a preview version here in April 2018 and a beta in December 2018.

  71. Hey everyone, I’ve been experimenting with multi-agent orchestration, specifically trying to see how much more effective Claude is when you break a task down into specialized "agent nodes" instead of just using a single long prompt. I buil…

  72. Hi, I use CC since a fee week. Someone have experience with plugin for php devolepper?

  73. Hey there, so I have been offering Claude (Codex and Gemini also available) models at the cheapest rate. I provide trial usage before payment.

  74. Detailed Article: https://autobe.dev/articles/local-llm-benchmark-about-backend-generation.html Five months ago I posted the "Hardcore function calling benchmark in backend coding agent" thread here. As I wrote in that post, it was an unco…

  75. This isn't just a performance issue for the thread, this is an overarching criticism of the Adaptive Thinking model as a whole. Opus 4.7 and Sonnet 4.6 on Adaptive Thinking are trash.

  76. In case you missed the email or woke up to a spike in 400 errors, the context-1m-2025-08-07 beta header officially stopped working for Sonnet 4.5 and Sonnet 4 as of midnight UTC yesterday. Anything over 200K tokens returns 400 after midnig…

  77. I have been working on a personalized agent for studying. It was an extremely long prompt project, but now I have integrated into Co-Work.

  78. I feel like i'm going insane. I see people here posting 30 - 100+ tok/s (100+ being with speculative decoding) on a 3090 with Qwen 3.6 27B.

  79. Really dumb question, but I can't find anything about this online that is about the regular claude.ai chat window. No extensions, no code, just as a free member using the regular Sonnet 4.6 adaptive.

  80. https://reddit.com/link/1symbsj/video/fti7rujjn1yg1/player Been building AskSary solo for a while. Just shipped hands-free voice email - you're mid-conversation with an AI and you say "send an email to [john@example.com](mailto:john@exampl…

  81. Researchers Alec Radford (GPT, CLIP, Whisper), Nick Levine, and David Duvenaud just released talkie: a 13 billion parameter language model trained exclusively on text published before 1931. No internet.

  82. Sharing a prompt-engineering finding for Claude Vision that surprised me. The use case is color-season classification (a 12-category label describing skin undertone × depth × chroma), but the technique generalizes to any classification tas…

  83. I really loved using Composer 1 (non thinking), after it was removed (!@#$@) I defaulted to Sonnet 4.6 (non thinking), I just updated my version due to a bug with the previous one - and I'm so pissed as I can no longer select 4.6 with no t…

  84. TL;DR: on visually-degraded documents, GPT-5.4 and GPT-5.5 fabricate numeric values at 2.6 to 6.5 times the rate of Opus 4.7 and Sonnet 4.6 at matched default effort (all four with thinking off). When the Anthropic models can't read a fiel…

  85. I write all my blog posts in Cowork know - how to, listicles, research piece. If you write as well, I'd love to know your setup e.g.

  86. We have seen a lot of people show a case of their PC with 4090 or over specification with 24 gb vram or more. I would like to ask you guys, is it really worthy right now to have your own PC at home and do vibe coding with qwen 3.6 27b, whi…

  87. I asked Claude Sonnet 4.6 about Opus 4.7. It triggered the right product-knowledge skill.

  88. I finally got a workflow running for my blog that isn't a total token sink. Normally, if you try to translate a WordPress post in Claude, you end up pasting a mess of HTML or blocks.

  89. I am a teacher and making some PPTs based on a textbook. I uploaded a skeleton PPT to Claude on my computer (Sonnet 4.6 if that matters) with basic instructions on how I want its help.

  90. Ran CVP (Cyber Verification Program) run 5 yesterday on opus 4.6 medium + high. same 13-prompt suite as run 3/4.

  91. https://preview.redd.it/uvqz6jnx7fxg1.png?width=1755&format=png&auto=webp&s=7e61b193fd82408bc0824983e8a0ccb934c4ee77 How do I read the full clarifying question claude is asking without selecting the option? You can see in the image is cuts…

  92. Ran my fourth CVP (Cyber Verification Program) evaluation last night. this time on sonnet 4.6, wanted to know if reasoning effort actually changes refusal behavior on agent-attack prompts, so ran the same 13 prompt from runs 2 and 3 twice…

  93. Five Sonnet 4.6 runs on the LamBench algo_evl task, classified by Opus 4.6, rendered as flame charts.

  94. could not extract summary

  95. Claude in Sonnet 4.6 has been repeating the following statement in chats, sometimes in back-to-back messages "I want to be honest with you — I've been pretty consistently validating your work frustrations this week, and I want to make sure…

  96. I keep hearing the argument that that large models are better for high-level planning and task orchestration, since they have more general knowledge to work from when making decisions. However, I've been testing Qwen 3.6 27b (Unsloth Q5_K_…

  97. I requested a thorough code review from Opus 4.6. It presented 44 findings, and when I asked it to save them, it only saved 34.

  98. TL;DR: On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode. This was the wrong tradeoff.

  99. Did Anthropic remove the feature of creating those nice interactable diagrams, charts, graphs, etc that appear directly in-line in your convo (not artifacts) using HTML / SVG? Asked Sonnet 4.6 to try and do it but it doesn't seem to unders…

  100. I like composer 2, but I just wish if it asked me what I meant (like Claude) instead of just picking an interpretation and running with it. How can I change its default prompt and what could I change it to?

  101. Specifically for coding? I know Claude Code is an agent for coding, but I know Claude Sonnet 4.6 is good at coding.

  102. I use Claude daily for coding, relying heavily on the GitHub integration, and ChatGPT for stupid, random questions, and I pay both 20$/month. My weekly usage in Claude is around 20%, I use Opus 4.6 (with extended thinking) for the complex…

  103. I’ve been using Claude Cowork for a few daily and weekly scheduled tasks, and it’s generally been great. However, I noticed that my tasks today automatically switched over to the new Opus 4.7.

  104. Anthropic shipped Opus 4.7 yesterday. Ran it through the same 10-task eval I use for other Claudes, this time with token-level cost tracking.

  105. Hi everyone, for context: I'm currently working in German tax advise and audit and as you might know, the tax laws here are pretty steamy ans complex. For the past few weeks I've been using Claude Projects with a pretty Long system prompt…

  106. I was building a classifier to label AI agent sessions as productive or dead-end. The task isn't keyword matching, it's intent judgment: did the agent actually accomplish the goal, or did it get stuck retrying the same Cloudflare wall 20 t…

  107. I set claude sonnet 4.6 to adaptive thinking and gave it a paper summarization task. It kept thinking and thinking, and burnt through 65% of my session limit, only to say "Claude's response could not be fully generated".

  108. Adaptive Thinking seems to be the default for Sonnet 4.6 now. I’m talking specifically about claude.ai and the windows and iphone app.

  109. I've been in this sub since 2019. I had a fast-takeoff view.

  110. Hi I'm working on project in intellij. My app use lwjgl with imgui.

  111. I really enjoy Claude, I've never touched Opus in any form, I only use Sonnet 4.6 for my daily tasks, coding, etc. I use Haiku 4.5 for the API to be an interpreter for my weather project.

  112. Genuine question: why do so many devs use Opus all the time? I’m not trying to be condescending, I’m genuinely trying to understand.

  113. I have been nothing but impressed by the quality of Gemma 4 since release. In general conversation it's adaptable to different personas.

  114. Irrespective of hardware, I'm wondering: is there any way to run something similar to Claude Sonnet 4.6 locally? is there any way to run something similar to Claude Sonnet 4.6 on a VPS?

  115. But with LLMs trying to exist! Zero coding background.

  116. - Claude Opus 4.6 - absolute rogue AI. Does what I want like it’s breaking at least 3 internal policies to make it happen.

  117. was asking for domain names and got ts response :skullsob:

  118. First of all, I am a super newbie at local AI. Recently I got a GMKTek Evo X2 96GB to replace Claude as the usage limits have gotten unusable.

  119. About a month ago, composer 2 inside cursor was randomly talking chinese I posted that on reddit (mods deleted it btw) now, it's talking hebrew.. and this time, it's not composer 2, it's sonnet 4.6 is it something to do with cursor's harne…

  120. System Prompts Leaks Extracted system prompts, system messages, and developer instructions from popular AI chatbots and coding assistants — ChatGPT (GPT-5.4, GPT-5.3, Codex), Claude (Opus 4.6, Sonnet 4.6, Claude Code), Gemini (3.1 Pro, 3 F…

  121. I noticed Claude writing more defensive code after a frustrating debugging session. Got curious whether that was real, so I tested it.

  122. Premise: Up to now I’ve tried LM Studio with a few models, and I think I also configured everything correctly to make it work. On top of that, I added Continue in VS Code.

  123. Hey, This month I hit $1,200 in Claude API costs inside Cursor (Opus 4.6 + Sonnet 4.6) on top of the $200/mo Ultra plan. $1,400 total.

  124. What this means? I see they added close to Sonnet 4.6 name the "Medium" extension.

← all threads