Show HN: Sigma Guard – deterministic contradiction checks for graph memory (news.ycombinator.com)
I built a small open-source verifier for graph-backed AI memory and GraphRAG-style systems. The basic problem: graph databases can validate schema, but they usually do not know whether two accepted facts contradict each other.
Opus's thoughts on Marc Andreesen's system prompt (www.reddit.com)
https://claude.ai/share/12659fcf-c1c8-4bbb-bc45-b41b26cd8b69
RL Benchmark "Ant" in Hardware (github.com via hn)
Physical Ant Video of a learned behaviour directly on hardware. Hardware.
"ClaudeBleed" allows any Chrome extension to control Anthropic's AI assistant (cyberinsider.com via hn)
A critical flaw in Anthropic’s “Claude in Chrome” browser extension allows any Chrome extension, even one with zero permissions, to hijack Claude’s AI capabilities and perform sensitive actions on behalf of users. The issue, discovered by…
-
90 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
3 itemsmodel roundup
MiniMax 2.7MiniMax-M2.7 is a large language model from MiniMaxAI capable of complex agent tasks and self-evolution, achieving a 66.6% medal rate on MLE Bench Lite, second only to Opus-4.6 and GPT-5.4. Community members have shared success in running the model with up to 100k context tokens and noted its potential for real-world applications like software engineering.
Show HN: ChonkLM – Tiny language models running offline in the browser (chonklm.com via hn)
I had been looking to try <500M parameter language models but you wouldn't find an API to try them anywhere, so I built this cloudflare hosted static website that hosts weights and built an inference runtime for these models that uses WebG…
Show HN: Vibe-coding video games with Claude (Day 26: Primetime) (gamevibe.us via hn)
I'm making a new video game every day as a hobby project, but I'm vibe coding it and writing nearly zero lines of code myself (even though I could, I'm a senior SWE). Today it's an original math game, Primetime where you click the non-prim…
- Show HN: Vibe-coding video games with Claude (Day 24: Fishies) (gamevibe.us via hn)
- Show HN: Vibe-coding video games with Claude (Day 21: Blackjack) (gamevibe.us via hn)
- Show HN: Vibe-coding video games with Claude (Day 15: Mahjong) (gamevibe.us via hn)
+3 more
- Show HN: Vibe-coding video games with Claude (Day 14: Tetris) (gamevibe.us via hn)
- Vibe-coding video games with Claude (gamevibe.us via hn)
- Show HN: Vibe-coding video games with Claude (gamevibe.us via hn)
Show HN: CLI to budget Claude Code session costs (github.com via hn)
I code on Claude's pay as you go API and started budgeting my token usage with my own CLI wrapper. Basically I set a budget for a task that I'm working on in my project: Task: "Fix mobile responsiveness" Budget: $3.00 and the budget update…
general doubt on claude (www.reddit.com)
caude is running out of limit when i ask it to a lengthier task in one single prompt. it would also run out of limit if i prompt such a lengthier task in several prompts either.
-
6 items
model roundup
Sonnet 4.5On May 4, 2026, multiple automated status updates reported elevated errors for Claude Opus 4.5 and Sonnet 4.5 around the same time, with Anthropic introducing a feature called E-STEER that applies emotion intervention to these models.
- 41m the Claude App just said that Sonnet 4.5 is going to become unavailable for chat May 16th… I thought it wasn't close to depreciation?
- 15h Sonnet 4.5 is being retired.
- 17h Adiós Sonnet 4.5
- 1d how i can improve inference speed
- 2d Claude's answer has nothing to do with my question and the whole conversation at all? First time this happened. Using Sonnet 4.5 thinking.
162 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
Why do so many AI agent projects never reach production? (www.reddit.com)
I’m trying to understand a recurring problem in the AI agent space. A lot of people are interested in agents.
Mirage · Unified Virtual Filesystem for AI Agents – Strukto (www.strukto.ai via hn)
A Unified Virtual Filesystem Workspace A simulated environment where AI agents reach every data through one filesystem and bash. Read the docsnpm install @struktoai/mirage-node # Node, servers, CLIs npm install @struktoai/mirage-browser #…
Claude Code Sandboxing (code.claude.com via hn)
Documentation Index Fetch the complete documentation index at: https://code.claude.com/docs/llms.txt Use this file to discover all available pages before exploring further. Overview Claude Code features native sandboxing to provide a more…
- Setting up Claude code (www.reddit.com)
- Telemetry for Claude Code (latitude.so via hn)
- How to start with Claude Code (www.reddit.com)
+20 more
- Claude Code Manager (www.reddit.com)
- Where should I start with Claude Code? (www.reddit.com)
- What Happened to Claude Code (man-labs.com via hn)
- Claude Code Routines (code.claude.com via hn)
- Claude code (www.reddit.com)
- Errors in Claude Code (www.reddit.com)
- Claude Code App? (www.reddit.com)
- Multiplayer Claude Code (www.reddit.com)
- A Harness for Claude Code (euleptos.com via hn)
- How Claude Code and Codex approach sandboxing (instavm.io via hn)
- Does Claude Code Hate UI's? (www.reddit.com)
- Claude Code Manager (www.reddit.com)
- Claude Code + Obsidian? (www.reddit.com)
- Routines in Claude Code (claude.com via hn)
- Qwen 3.6 for Claude Code in 1L (www.reddit.com)
- How I feel when I Claude Code.. (www.reddit.com)
- Claude Changes My Code (alexcbecker.net via hn)
- Claude Code Hackathon! (www.reddit.com)
- Aider and Claude Code (www.reddit.com)
- HOW TO USE CLAUDE CODE (www.reddit.com)
The Gemini Protocol in 2026 (kevinboone.me via hn)
The Gemini Protocol in 2026: growing, but still not setting the Internet aflame Note This article is about Gemini, the HTTP-like Internet protocol for document browsing, and not the large language model or the cryptocurrency of the same na…
-
181 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 1h Not a good day for team "Claude Mythos is Just Marketing Hype"
- 20h METR evaluated an early version of Claude Mythos
- 1d Could Mozilla Security Hot Air Fill Mythos Sails?
- 1d Mythos set off a cybersecurity 'hysteria.' Experts say threat was already here
- 1d Mythos Fallout, U.S. Government Weighs AI Model Regulation
140 itemsmodel roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 1h The Quantization Method Apple Silicon Actually Rewards | by Alexandru Vasile | Mar, 2026
- 4h Does llama-swap actually work with mlx_lm.server / MLX models on macOS?
- 17h DeepSeek-TUI
- 18h Just got a 8x 32gb v100 server... now what
- 1d I wanted to know small local LLM code and made a personal projects.
I’m looking for something that can: - chat with me (Telegram/WhatsApp/etc) - manage tasks/workflows - maybe access tools like Gmail, Calendar, Docs, GitHub - possibly code/automate things But without the huge security risks of setups like…
I've been having issues lately with Claude completely ignoring certain instructions in CLAUDE.md. I did some digging and found something interesting with the claude cli harness and I'm curious if anyone else has come across this.
Usage limits technique (www.reddit.com)
Is Claude changing your daily habits with the dumb "5 hours from start" stuff? Today I found myself waking up, asking Haiku something basic (what day it was) so the clock starts.
Best way to use remaining tokens from ollama cloud (www.reddit.com)
Hey Bros, I have around 80% tokens for the week, If anyone needs it or suggest me what I can do with it will be helpful.
-
24 items
event
DeepmindGoogle DeepMind has released "Deep Research Max," advancing autonomous research agents, while also facing challenges and competition from other AI companies like Anthropic and Ineffable Intelligence. Meanwhile, DeepMind workers in the UK have voted to unionize, and former DeepMind architect Demis Hassabis is at the center of legal drama involving Elon Musk.
- 1h DeepMind Employee calls out private AI labs: go public, let regular people invest, or admit you're just enriching billionaires
- 3h EVE Online dev establishes "research partnership" with Google DeepMind
- 1d [Google DeepMind] the AI co-mathematician also achieves state of the art results on hard problemsolving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.
- 2d Subquadratic claims to break LLM scaling limits! 1000x less costs
- 3d Google DeepMind takes a minority stake in the maker of EVE Online
If you use AI agents or know people who do, AgentVet might be worth checking out. It is a community-driven site where users rate and review AI agents, the idea is to help people cut through the noise and find the right tool for their actua…
Show HN: Mlx-code – I built a "backyard shed" AI coding agent for Mac (github.com via hn)
mlx-code A lightweight coding agent for Mac, built on Apple's MLX framework. Fast local inference, built-in prompt caching, robust tool-calling.
LOOM - TTY based editor purpose built for cloud agent coding (www.reddit.com)
Hey all, Just wanted to turn you on to a open source TTY based IDE with a single line installer called LOOM. I built this editor (well me and Claude) to be purpose built to for coding along side an agent in cloud VM based environments.
Musk, Altman Management Styles Under Fire at OpenAI Trial (www.bloomberg.com via hn)
OpenAI Trial Highlights Criticism of Musk, Altman Management Styles - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomberg Anywher…
Wind patterns, Koppen classification, anthropology, map design, and effects of geography on people, it can do it all with proper research on wind and climate patterns.
Please help!! 5+ hrs with claude design completely gone (www.reddit.com)
Hi guys, I've been using claude design and I was finally able to get a beautiful design that I wanted after 5 hours of going back and forth. Then I started a new chat within the same project and it completely overwritten the previous desig…
I have been building Dunetrace, a open-source real-time monitoring tool for your production agents. The latest update adds: Cross-agent pattern analysis.