1. Fake building: Claude wrote 3,000 lines instead of import pywikibot TL;DR. Claude would rather reinvent the wheel than pip install one.

  2. Do you think they will end up learning the most painful workflows from enterprise customers and built all the most necesary agents for the smaller guys themselves? In other words, squeezing out all the agentic companies out there?

  3. Bigger ubatch made gpt-oss-120b prompt processing much faster on my RTX 3090 I was tuning gpt-oss-120b-F16.gguf with llama.cpp on a 24 GB RTX 3090 and found that increasing the physical micro-batch size (-ub) can massively improve prompt p…

  4. Large language models (LLMs) show strong performance across natural language processing (NLP), mathematical reasoning, and programming, and recent large reasoning models (LRMs) further emphasize explicit reasoning. Yet their computational…

  5. OpenAI, despite its name, is usually extremely secretive about its operations. It promotes a carefully crafted image to the world.

  6. SigmaShake Desktop - Guardrails for YOLO AI coding agents Your AI will use the wrong tool, nuke your database, force push to main because it won't respect your markdown instructions One ruleset, every major AI coding tool, local, no cloud,…

  7. event

    Cowork
    211 items

    Issues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.

    model roundup

    DeepSeek 4
    106 items

    DeepSeek-V4-Pro is a 1.6T parameter Mixture-of-Experts model supporting one million-token context, with significant improvements in efficiency and stability through hybrid attention and manifold-constrained hyper-connections. Community highlights include its cost-effectiveness via the official API and exceptional performance in large code change evaluations, with some noting its surprisingly robust output capability despite a 384K max token limit.

  8. Was trying to get a good set of models with NVFP4 to leverage the RTX Pro 6000 and was able to get across a few hurdles and have configs + wheels set up & ran benchmarks while i was at it. hopefully this helps some folks out.

  9. could not extract summary

  10. I’ve been following the recent discussions here about why many “AI agents” fail in production, and I agree with the automation-first argument. A lot of so-called agents are really just workflows with one or two LLM calls.

  11. Most agent design conversations focus on the LLM loop. After running an agent in production for a week, I think the more important question is the human-in-the-loop boundary.

  12. I gave it a photo of my cats and my website link and told it to design a business card using HTML and use playwright to take screenshots and keep iterating until it's perfect. I bought A set of Avery printable business card paper from walm…

  13. A common operational inefficiency in almost every company I've seen is the double-check process performed when someone takes an action. There is nothing more wasteful than establishing a double-check system.

  14. model roundup

    Opus 4.7
    336 items

    Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.

    190 items

    Anthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.

  15. I’m seeking support or criticism to either make this issue advance or explore alternative solutions. tl;dr on the issue: Claude Code plugins through marketplace are installed by the user wherever they want.

  16. CyberMe: Visual knowledge base maintained by LLM Agent English | 简体中文 CyberMe is a general-purpose knowledge base framework maintained by an LLM Agent, suitable for personal, team, or enterprise knowledge bases. It keeps raw materials, a s…

  17. I kept catching myself doing this little ritual — open my pinned claude.ai usage tab, squint at the bars, get back to work. Ten minutes later, same thing again.

  18. Was about to post this in the thread about Anthropic's one trillion $ valuation, but it's a bit of a different way of looking at things, so I figured I ask here. Re-emphasizing this is about Claude and Anthropic, so please, mod-bot, go eas…

  19. I've saw a 3D printed claude bot that jumps up and down when Claude wants your attention, so I decided to build something similar but using a raspberry Pi with a screen. https://i.redd.it/guv1jd60yl0h1.gif All the code is here: https://git…

  20. RetrievalCI Stage: bench-v0 early preview. The methodology, scorecard format, and 9 system adapters are stable.

  21. model roundup

    Opus 4.6
    89 items

    Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.

  22. How to use Codex pets (and make your own!) Use /hatch to get a cute companion for your projects OpenAI has really been cooking lately. They’ve gone CRAZY on capabilities and features for Codex, their fast-growing Claude Code competitor.

  23. graft Persistent graph memory for AI agents and microservices. Save what you learned.

  24. I've made an account as a long-time lurker because I am hoping y'all could help reconcile my experience in my company/team with what seems to be the wise HN consensus around LLMs. My Background (Software Engineer II): I've been writing sof…

  25. After my cofounder and I finished our Master's two years ago, we ended up spending entire days filling out Workday forms instead of actually preparing for interviews. Low point: I joined what was supposed to be a "first round interview" on…

  26. I actually think the opposite is true lol the more autonomous an agent becomes, the more expensive every mistake gets when an agent is just generating text, bad outputs are annoying when an agent starts: sending emails editing records touc…

  27. The Artificial Analysis Coding Agent Index includes 3 leading benchmarks that represent a broad spectrum of coding agent use: ➤ SWE-Bench-Pro-Hard-AA, 150 realistic coding tasks that frontier models struggle with, sampled from Scale AI’s S…

  28. I emailed one of the musicians on Claude FM and he had NO idea his music was being used So for those who don't know, Anthropic recently started a 24/7 lofi/ambient music livestream on YouTube called Claude FM. The song titles and artist na…