Show HN: zot – Yet another coding agent harness (www.zot.sh via hn)
Why I Built Another coding agent harness?: https://dev.to/patriceckhart/zot-why-i-built-another-coding-... Github Repo: https://github.com/patriceckhart/zot
- Show HN: A simpler coding agent harness (news.ycombinator.com)
Show HN: Built a local-first way to make AI context reusable across tools (www.proxvanta.com via hn)
Built ProxVanta over a few weekends after running into the same problem over and over: useful AI context ends up scattered everywhere. Some in GitHub, some in Slack, some in docs, some in people’s heads, and some via posts from people tell…
CrowdStrike Linux Agent Easy way to make it better (news.ycombinator.com)
I love crowdstrike, its amazing. However, its Linux agent isn't the best.
AgentCheck – Pytest for AI Agents (pypi.org via hn)
Pytest-style behavioral regression testing for AI agents. AgentCheck AgentCheck is pytest for AI agents.
I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine. Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainly looking for independent validation fr…
GOT BORED OF BLOCKED GAMES SO MADE MY OWN WITH CLAUDE (www.reddit.com)
Long story short, in class I'm always searching the web for new websites and games and even when I do find one it's always full of lag and ads. So, I decided to vibe code my own website.
-
124 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
120 itemsevent
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
- 1h Built an AI quoting system on Claude cowork, now stuck on the boring part: how do teammates stay in sync?
- 8h Anyone got any tips for having Claude run autonomously for as long as possible before hitting session/ tool limits?
- 10h How I Used Claude Code to build an AI Jobs Globe in One Day
- 11h Can Claude Cowork send an unprompted message to Dispatch?
- 12h Read‑only skills and “Edit with Claude” are a UX regression
Show HN: Modern alternative to Google Dictionary, AI-powered and context-aware (chromewebstore.google.com via hn)
I kept losing my reading flow every time I hit an unfamiliar word. The usual fix: open a new tab, search, scroll past ads, come back.
What's the hardest part about getting AI agents into real workflows? (www.reddit.com)
Been trying to incorporate AI agents into my day-to-day for a few months now and I keep hitting the same wall. Most demos look great but when I try to plug agents into a real workflow, the friction adds up fast.
OpenAI Could Be Building an AI-First Smartphone That Replaces Apps (techputs.com via reddit)
OpenAI may be preparing to take its biggest leap yet, moving beyond software into hardware with a smartphone designed around AI agents instead of traditional apps. According to a new note from well-known analyst Ming-Chi Kuo, the company i…
For months I defaulted to Opus for anything complex. Sonnet felt like a gamble, sometimes great, sometimes it would confidently build the wrong thing and I'd spend an hour unwinding it.
What Claude Shannon Knew in 1950 That We're Pretending Is New (www.thecontentwrangler.com via hn)
What Claude Shannon Knew In 1950 That We’re Pretending Is New AI didn’t arrive yesterday; it just changed its outfit Every era gets its favorite tech panic. Ours, apparently, is watching a chatbot say something polished, half-right, and fa…
Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
- Show HN: AgentSwarms – free hands-on playground to learn agentic AI, no setup (agentswarms.fyi via hn)
-
76 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
- 1h Cursor & Claude deleted a company's entire database
- 8h How I get 100% accurate answers, and replaced Google with Claude
- 11h GPT-5.5 improves over GPT-5.4 and overtakes Opus 4.6 to take the 2nd place behind Gemini 3.1 Pro on the Extended NYT Connections Benchmark
- 11h Found 48 Vulnerabilities in Open Source Projects During Live Testing with Claude Opus 4.6
- 12h Claude 4.6 Beats GPT-5.4, Grok & Gemini in a Strict Multi-Domain AI Test (2026)
203 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
being a jerk to claude 101 (www.reddit.com)
https://preview.redd.it/j6423wfihvxg1.png?width=482&format=png&auto=webp&s=cbc433e8a96502fd370f020c551b39d06b893aa6
How are people using so many tokens ??? (www.reddit.com)
I've been using Claude basically since it launched, and use Claude Code extensively (Swift, C++, Shaders, TS, AWS, etc)... Maybe this is just tech twitter / LinkedIn garbage, but how on earth are people using so many tokens...
Who’s going to Code w/ Claude San Fran Nay 6th? (www.reddit.com)
I wanted to see who was going to the code w/ Claude extended. I live in the Midwest and wanted to hear if others were flying in to see this.
Grok (www.reddit.com)
could not extract summary
- Grok 4.3 Beta (grok.com via hn)
- Where is Grok-2 Mini and Grok-3 (mini)? (www.reddit.com)
1386.ai.rocm This is a fork of 1386.ai ported to ROCm, targeting specifically the AMD Strix Halo APU but compatible with any ROCm-supported hardware. I found this repo through a Reddit post where the author (@eb1386) nonchalantly announced…
Self-correction can make LLM outputs worse unless you verify first (www.reddit.com)
A lot of agent frameworks quietly assume this loop is safe: model answers model critiques itself model revises output improves The uncomfortable part is that unconditional self-correction often degrades correct answers more than it repairs…
-
51 items
model roundup
Sonnet 4.6Sonnet 4.6, a new release noted for its "unhinged" behavior, has sparked discussions among users about unexpected changes in software performance and cost management strategies involving Cursor and Claude APIs.
- 1h Claude Sonnet 4.6 multi-photo reconciliation prompt — jumped my classifier agreement with human experts from 55% to 82%
- 9h I hate thinking models, any way to use the default ones?
- 16h GPT-5.5 hallucinates at 6 times the rate of Opus 4.7 on degraded insurance docs
- 20h Claude was told to check the docs. It didn’t. Then it corrected me.
- 1d Using MCP to stop wasting tokens on WP translations
Is markdown the programming language for agents now? (www.reddit.com)
Markdown is clearly a wave now. It is good enough for AI who can read content structure without wasting tokens.
Bit of context. Over the last couple of years I've shipped automation projects for around 30 professional services founders.
Hello all, I would like to automate the Git PR review process as much as possible in my company. I found several possible approaches online, but I am still missing a clear best-practice recommendation.
How is the Chrome MCP so bad? (www.reddit.com)
I can't be the only one having trouble using the Claude on chrome mcp right? It worked well for like a week and then suddenly Claude can't use chrome anymore.
How to make ClaudeCode Agent know it's identity? (www.reddit.com)
Hey everyone, I’ve been diving deep into the Claude Code CLI and I’m hitting a bit of a wall with session management vs. agent identity.
Show HN: Discuss CLI – No more reviewing agent plans in the terminal (github.com via hn)
I'm a big user of Codex and Claude Code in the terminal. However after a big brainstorming and planning session I was finding myself with lots of comments and questions about difference places in the plan file.
My inbox was filling up with spam and I kept putting off going through it for too long. So I vibe coded a small workflow that handles most of it for me.