Building an Unverified Compiler with Agents (www.basis.ai via hn) Show HN: Marky – A lightweight Markdown viewer for agentic coding (github.com via hn) Ask HN: Does ChatGPT's data export feature work? (news.ycombinator.com via hn) Gemini can now create personalized AI images by digging around in Google Photos (arstechnica.com via hn) Coding Agents Degrade Sandboxes to Security Theater (guardbase.io via hn) Title: Anxiety Management: Evidence-Based Coping Strategies | .Health URL Source: https://guardbase.io/blog/coding-agents-degrade-sandboxes-to-security-theater/ Markdown Content: Anxiety disorders affect millions of people, but effective m…
Show HN: Stack – the control plane for AI agents (getstack.run via hn) Claude Code is a black box. Here is how to trace its tool calls and LLM requests (www.arthur.ai via hn) Digital Tap AI: OSS agents to detect and stop idle cloud clusters (github.com via hn) - 13 items
thread
Sonnet 4.6Sonnet 4.6, a new release noted for its "unhinged" behavior, has sparked discussions among users about unexpected changes in software performance and cost management strategies involving Cursor and Claude APIs.
Android Auto users say Gemini won't stop talking, and it's not even right (www.androidauthority.com via hn) My experience with Claude and Codex on a system architecture bug (swaranga.dev via hn) Show HN: A tool to calculate LLM model API costs when coding (the-designengineer.com via hn) By submitting this form, you agree that your data will be processed to respond to your enquiry. Read our Privacy Notice.
We Built an MCP with 229 Tools (Without Writing a Single Tool Definition) (www.apideck.com via hn) How Apideck auto-generated a 229-tool MCP server from an OpenAPI spec using Speakeasy, deployed on Vercel with dynamic tool discovery at 1,300 tokens. A walkthrough of the stack, the hosting tradeoffs, and the hard-won lessons from shippin…
Show HN: Health billing agent denies claims in 1.2s, offices should know why (news.ycombinator.com via hn) Me the moment the 5k lines of code provided by Claude finally work 😭 (www.reddit.com via reddit) Title: URL Source: https://i.redd.it/ufwjmy91djvg1.jpeg Warning: Target URL returned error 403: Forbidden Markdown Content: You've been blocked by network security. To continue, log in to your Reddit account or use your developer token If…
DeepSeek Updated their repo DeepGEMM testing Mega MoE (www.reddit.com via reddit) https://github.com/deepseek-ai/DeepGEMM/pull/304 https://preview.redd.it/vcmqwmvzijvg1.png?width=1014&format=png&auto=webp&s=76b1739925f0699b0763aa7814614dd40329c41e https://github.com/deepseek-ai/DeepGEMM/commit/a050d09461e86eb6bba35a8c74…
Claude is doing 80% of my thinking now and honestly I'm not sure how I feel about it (www.reddit.com via reddit) started using claude for basically everything brainstorming, writing, debugging, even planning my week lol. its gotten to the point where my actual workflow is claude for the thinking layer, cursor for code, and runable when i need agents…
LLM risk spreading misinformation to humans who are least able to identify it (arxiv.org via hn) While state-of-the-art large language models (LLMs) have shown impressive performance on many tasks, there has been extensive research on undesirable model behavior such as hallucinations and bias. In this work, we investigate how the qual…
Built a free Claude skill that adds /share, turns HTML outputs into public URLs instantly (www.reddit.com via reddit) Our team at BotsCrew uses Claude constantly: dashboards, briefs, competitive analyses, prototypes, and internal reports. Claude builds genuinely good stuff.
A practitioner's framework for engineering trust from unreliable agents (michael.roth.rocks via hn) Ask HN: Why no insurance is fully transparent about how they handle each case? (news.ycombinator.com via hn) I was thinking maybe it's possible to make an insurance on the Blockchain where an LLM is the oracle and people can see how cases are handled
Mneme – project memory injection for LLM workflows (github.com via hn) - 57 items
thread
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
I asked Claude to "Create a parody of one of these hard-to-fill out online job applications." (claude.ai via reddit) Title: Absurd Job Application Form Parody | Satirical HR Portal URL Source: https://claude.ai/public/artifacts/e18435e2-70eb-48aa-9eaa-2aad2338172d Warning: This page contains shadow DOM that are currently hidden, consider enabling shadow…
I rebuilt a full event platform in 5 weeks using Claude Code (www.gpthacks.com via hn) I replaced a 6-Month, 5-person rebuild with 5 weeks of Claude Code Hey everyone, I know it’s been a while. Since founding Eventship, a platform for building in-person communities with events, I haven’t had much time to write.
Kelvin Claw: A secure, modular agent harness with supply-chain validated plugins (agentichighway.ai via hn) Agentic Highway Team KelvinClaw: A secure, modular agent harness with supply-chain validated plugins An agent runtime designed for zero-trust environments from the ground up. Building secure agent systems at scale is a different problem th…
Buddy – Anthropic killed /buddy. We made it permanent, cross-platform, and alive (github.com via hn) Buddy: The /buddy Rescue Mission for Your AI Terminal The open-source /buddy rescue mission for AI terminals Persistent memory, XP, species, and context-aware feedback for Claude Code CLI, Codex CLI, Gemini CLI, Copilot CLI, Cursor CLI, an…
A new transformer variant has been created to facilitate more efficient model training in distributed settings. 128x compression with no significant loss in convergence rates, increases in memory, or compute overhead (www.reddit.com via reddit) Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training. https://arxiv.org/abs/2604.11947 ResBM introduces a residual encoder-decode…
OpenAI continues to lose market share in GenAI website traffic, while Gemini, and Claude are gaining: (www.reddit.com via reddit) - ChatGPT 56.72% vs 77.43% 12 months ago - Gemini 25.46% vs 6% 12 months ago - Claude 6.02% vs 1.4% 12 months ago At this point in the race its all about distribution & the cost of serving these models.
Show HN: AgentPulse: Real-Time Observability Dashboard for Claude Code and Codex (blog.jaystuart.dev via hn) AgentPulse: A Real-Time Dashboard for Claude Code and Codex Sessions If you work with AI coding agents long enough, you run into the same problem: the agents are productive, but the workflow around them gets chaotic. One Claude Code sessio…