Fake building: Claude wrote 3k lines instead of import pywikibot (fireflysentinel.github.io via hn)
Fake building: Claude wrote 3,000 lines instead of import pywikibot TL;DR. Claude would rather reinvent the wheel than pip install one.
Do you think they will end up learning the most painful workflows from enterprise customers and built all the most necesary agents for the smaller guys themselves? In other words, squeezing out all the agentic companies out there?
Bigger ubatch made gpt-oss-120b prompt processing much faster on my RTX 3090 I was tuning gpt-oss-120b-F16.gguf with llama.cpp on a 24 GB RTX 3090 and found that increasing the physical micro-batch size (-ub) can massively improve prompt p…
RegexPSPACE: Regex LLM Benchmark (arxiv.org via hn)
Large language models (LLMs) show strong performance across natural language processing (NLP), mathematical reasoning, and programming, and recent large reasoning models (LRMs) further emphasize explicit reasoning. Yet their computational…
A consistent pattern of lying': trial exposes what insiders think of Sam Altman (www.theguardian.com via hn)
OpenAI, despite its name, is usually extremely secretive about its operations. It promotes a carefully crafted image to the world.
Show HN: Sigmashake Desktop – AI Coding Agent Guardrails (sigmashake.com via hn)
SigmaShake Desktop - Guardrails for YOLO AI coding agents Your AI will use the wrong tool, nuke your database, force push to main because it won't respect your markdown instructions One ruleset, every major AI coding tool, local, no cloud,…
-
211 items
event
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
106 itemsmodel roundup
DeepSeek 4DeepSeek-V4-Pro is a 1.6T parameter Mixture-of-Experts model supporting one million-token context, with significant improvements in efficiency and stability through hybrid attention and manifold-constrained hyper-connections. Community highlights include its cost-effectiveness via the official API and exceptional performance in large code change evaluations, with some noting its surprisingly robust output capability despite a 384K max token limit.
- 12m OpenCode + DeepSeek V4 Pro vs Claude Code CLI?🤔
- 5h What are the best opensource coding models for 8x A6000 setup
- 9h PACT, head-to-head LLM negotiation benchmark. 20-round buyer-seller bargaining game: each round the AIs can message, the buyer submits a bid and the seller submits an ask. If bid ≥ ask, trade clears at the midpoint. Thousands of matchups.
- 1d DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation: 85 tok/s @ 524k on 2× RTX PRO 6000 Max-Q
- 1d DS4
Was trying to get a good set of models with NVFP4 to leverage the RTX Pro 6000 and was able to get across a few hurdles and have configs + wheels set up & ran benchmarks while i was at it. hopefully this helps some folks out.
Usage limits for Opus ( via reddit)
could not extract summary
- Usage limits technique (www.reddit.com)
I’ve been following the recent discussions here about why many “AI agents” fail in production, and I agree with the automation-first argument. A lot of so-called agents are really just workflows with one or two LLM calls.
Most agent design conversations focus on the LLM loop. After running an agent in production for a week, I think the more important question is the human-in-the-loop boundary.
Claude code makes surprisingly good business cards (www.reddit.com)
I gave it a photo of my cats and my website link and told it to design a business card using HTML and use playwright to take screenshots and keep iterating until it's perfect. I bought A set of Avery printable business card paper from walm…
A common operational inefficiency in almost every company I've seen is the double-check process performed when someone takes an action. There is nothing more wasteful than establishing a double-check system.
-
336 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
190 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 1h METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours
- 6h Anthropic's bug-hunting Mythos greatest marketing stunt ever says cURL creator
- 12h Claude Mythos lands above the trendline for the AI 2027 scenario. The trendline has gone from exponential to superexponential.
- 12h Claude Mythos Opens the Cybersecurity Pandora's Box
- 13h OpenAI gives EU new cyber model access but Anthropic still holding out on Mythos
I’m seeking support or criticism to either make this issue advance or explore alternative solutions. tl;dr on the issue: Claude Code plugins through marketplace are installed by the user wherever they want.
A faithful LLM-wiki implementation with Wikipedia-style web browsing (github.com via hn)
CyberMe: Visual knowledge base maintained by LLM Agent English | 简体中文 CyberMe is a general-purpose knowledge base framework maintained by an LLM Agent, suitable for personal, team, or enterprise knowledge bases. It keeps raw materials, a s…
📊 Native desktop indicator for Claude usage (www.reddit.com)
I kept catching myself doing this little ritual — open my pinned claude.ai usage tab, squint at the bars, get back to work. Ten minutes later, same thing again.
Was about to post this in the thread about Anthropic's one trillion $ valuation, but it's a bit of a different way of looking at things, so I figured I ask here. Re-emphasizing this is about Claude and Anthropic, so please, mod-bot, go eas…
I've saw a 3D printed claude bot that jumps up and down when Claude wants your attention, so I decided to build something similar but using a raspberry Pi with a screen. https://i.redd.it/guv1jd60yl0h1.gif All the code is here: https://git…
RAG Eval Comparing Vertex/Bedrock/Azure/OpenAI (github.com via hn)
RetrievalCI Stage: bench-v0 early preview. The methodology, scorecard format, and 9 system adapters are stable.
-
89 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
- 1h opinion on "ninja chat "
- 1h PSA: How to preserve your account's access to Sonnet 4.5 beyond June 15th
- 15h Benchmarking Claude Opus 4.6 Vulnerability Detection
- 21h Switched existing chat from Opus 4.6 to 4.7 then back to 4.6. Learned a lesson
- 1d Which model and version do you prefer for programming?
Codex Pets for People in a Hurry (www.augmentedswe.com via hn)
How to use Codex pets (and make your own!) Use /hatch to get a cute companion for your projects OpenAI has really been cooking lately. They’ve gone CRAZY on capabilities and features for Codex, their fast-growing Claude Code competitor.
- Codex Pets (developers.openai.com via hn)
Graft – semantic memory for AI agents, without the LLM (github.com via hn)
graft Persistent graph memory for AI agents and microservices. Save what you learned.
Can you help reconcile my first/second-hand LLM Experience with HN's Experience? (news.ycombinator.com)
I've made an account as a long-time lurker because I am hoping y'all could help reconcile my experience in my company/team with what seems to be the wise HN consensus around LLMs. My Background (Software Engineer II): I've been writing sof…
After my cofounder and I finished our Master's two years ago, we ended up spending entire days filling out Workday forms instead of actually preparing for interviews. Low point: I joined what was supposed to be a "first round interview" on…
I actually think the opposite is true lol the more autonomous an agent becomes, the more expensive every mistake gets when an agent is just generating text, bad outputs are annoying when an agent starts: sending emails editing records touc…
The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s S…
Claude FM (www.reddit.com)
I emailed one of the musicians on Claude FM and he had NO idea his music was being used So for those who don't know, Anthropic recently started a 24/7 lofi/ambient music livestream on YouTube called Claude FM. The song titles and artist na…
- What’s up, Claude? (www.reddit.com)
- Claude misgenders me (www.reddit.com)
- If the EU had built Claude (www.reddit.com)
+88 more
- Teaching Claude Why (www.anthropic.com via hn)
- Claude + MS (www.reddit.com)
- Will claude me worth it for me? (www.reddit.com)
- Claude or GPT (www.reddit.com)
- Chatgpt vs. Claude (www.reddit.com)
- Claude has a conscience! (www.reddit.com)
- Teaching Claude Why (alignment.anthropic.com via hn)
- Claude use. (www.reddit.com)
- Does Claude Have Feelings? (www.theatlantic.com via hn)
- Claude Radio?? 🤣 (www.reddit.com)
- Claude Says No (wadetregaskis.com via hn)
- Can Claude do Better? (www.reddit.com)
- Claude: (www.reddit.com)
- Try Claude (www.reddit.com)
- Claude argentina (www.reddit.com)
- Claude has other things to do (www.reddit.com)
- Claude’s New Limits (www.reddit.com)
- Claude opus 4.7 (www.reddit.com)
- Claude for homelab (www.reddit.com)
- Claude Security (claude.com via hn)
- Claude's Memory of Me (exploration.work via hn)
- Claude in Copilot (www.reddit.com)
- Claude has peaked... (www.reddit.com)
- Claude Beginner (www.reddit.com)
- Claude.ai is unavailable (status.claude.com via hn)
- Claude memory (www.reddit.com)
- claude and its upgrades (www.reddit.com)
- Claude Watch, when? (www.reddit.com)
- Dear Claude (www.reddit.com)
- You are an expert "Claude" (www.reddit.com)
- Claude’s kids are... (www.reddit.com)
- Claude/ QuickBooks (www.reddit.com)
- Claude playground (www.reddit.com)
- Claude's webinars (www.reddit.com)
- Claude has a friend? (www.reddit.com)
- Why is Claude so wrong? (www.reddit.com)
- Claude Design (www.anthropic.com via hn)
- Claude for Word (claude.com via hn)
- Claude skills (www.reddit.com)
- Claude Project (www.reddit.com)
- Claude 4.7 vs. ChatGPT 5.5 (www.tomsguide.com via hn)
- Claude help (www.reddit.com)
- My claude family (www.reddit.com)
- Claude Opus 4.7 (www.anthropic.com via hn)
- Claude Opus 4.7 (www.anthropic.com via hn)
- How would you feel about "Claude Go"? (www.reddit.com)
- Claude just rickrolled me (www.reddit.com)
- All Claude subs (www.reddit.com)
- How do you work with Claude? (www.reddit.com)
- Why Claude is not consistent? (www.reddit.com)
- Claude Vault (www.reddit.com)
- Claude Brain (github.com via hn)
- Claude wall (www.reddit.com)
- Claude OAuth (developer.puter.com via hn)
- Claude.ai down (status.claude.com via hn)
- Hail, Claude (www.reddit.com)
- Claude code (www.reddit.com)
- Claude Refugee (www.reddit.com)
- Claude.md (gist.github.com via hn)
- Claude agent (www.reddit.com)
- Claude cerifications (www.reddit.com)
- Claude Opus 4.7 (www.reddit.com)
- Advantages of Claude.Ai (www.reddit.com)
- Claude or openaı? (www.reddit.com)
- Claude Is Down (news.ycombinator.com)
- Claude Design (claude.ai via hn)
- What's new in Claude Opus 4.7 (platform.claude.com via hn)
- I Did My Taxes with Claude (doempke.com via hn)
- Claude Team (www.reddit.com)
- What matters most to you about claude.md? (www.reddit.com)
- Claude Sucks. Claude Sucks, Claude Sucks. (www.reddit.com)
- Claude is not very smart (www.reddit.com)
- Claude talking to CC (www.reddit.com)
- What do you do with Claude? (www.reddit.com)
- Claude Design (www.reddit.com)
- Claude SandBox (www.reddit.com)
- Claude Design (www.reddit.com)
- Claude vs Kimi (www.reddit.com)
- Would you hire Claude? (www.reddit.com)
- Claude usage (www.reddit.com)
- Claude Vs Codex (claudevscodex.com via reddit)
- Claude + Neovim (www.reddit.com)
- Goodnight to Claude (www.reddit.com)
- Claude Mii (www.reddit.com)
- Claude and ToDoist (www.reddit.com)
- Claude Sucks (news.ycombinator.com)
- Claude for Sales (www.reddit.com)
- Sassy Claude! ; ) (www.reddit.com)