ChatGPT serves ads. Here's the full attribution loop (www.buchodi.com via hn)
OpenAI's ad platform has two halves. On the ChatGPT side, the backend injects structured single_advertiser_ad_unit objects into the conversation SSE stream while the model is responding.
great work, Gemma (www.reddit.com)
another day with pi + gemma 26B
Tencent used Anthropic's Claude to fine-tune it's new Hy3 AI model (www.reuters.com via hn)
paywalled
We decreased our LLM costs with Opus (www.mendral.com via hn)
Last week we wrote about feeding terabytes of CI logs to an LLM. Most of the questions on Hacker News weren't about the logs.
Agent, Know Thyself (and bid accordingly) (www.strangeloopcanon.com via hn)
Agent, Know Thyself! (and bid accordingly) why we need to train models to learn their own capabilities, and how this will help them bid for work!
Why don’t LLMs use explicit vector-based reasoning instead of language-based chain-of-thought? What would happen if they did?
Millennium Mixtape - Built with Claude (www.reddit.com)
Been using Claude for a while, mostly at work, but finally decided to upgrade my personal account and spend more time learning about Claude Code. Had it up and running in a few hours.
Many of us use agents to summarize tech blogs to stay updated. One day, I came across a previous Anthropic blog published on April 8th that had never been mentioned in my daily brief!
Proxies, Sandboxes and Agent Security (www.gouthamve.dev via hn)
Proxies, Sandboxes and Agent Security After my last post, I wanted to see how far I could take things. I have a home lab running in my office, where I have a bunch of different machines, and I run a combination of k3s and Ansible-provision…
-
79 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
129 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 21m llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged
- 6h Gemma-4 MLX reasoning?
- 7h Gemma4-31B-3bit-mlx · Hugging Face: 3 & 5 mixed quant for RAM poor Mac users.
- 11h Is long re-processing of output as input a common "feature" or not?
- 12h I ran Gemma 4 E2B with llama.cpp on a lot of different iPhones, here's the setup report
Something shifted for me a few months ago. I stopped treating Claude like a search engine and started treating it like a collaborator.
your agent is smart, but your team of agents isnt. (www.reddit.com)
cursor solves one agent really well. one human + several agents in one repo, great loop.
Amazon to offer OpenAI models on AWS after Microsoft exclusivity ends (www.aboutamazon.com via hn)
AWS and OpenAI are bringing the latest OpenAI models to Amazon Bedrock, launching Codex on Amazon Bedrock, and launching Amazon Bedrock Managed Agents, powered by OpenAI (all in limited preview), giving enterprises the frontier intelligenc…
vLLM-Compile: Bringing Compiler Optimizations to LLM Inference (docs.google.com via hn)
vLLM-compile: Bringing Compiler Optimizations to LLM Inference Luka Govedič vLLM Committer Senior Machine Learning Engineer, Red Hat 1
the AI OS has a missing layer (www.reddit.com)
been seeing a lot of "AI OS for companies". agent runtimes, MCP, the YC RFS, half the new yc batch.
- The Missing Layer In AI (www.reddit.com)
- What am I missing (www.reddit.com)
Mesa: A Versioned Filesystem for Agents (www.mesa.dev via hn)
Imagine you’re building a legal-tech agent that can help with real-estate transactions. The v1 was a simple chat-with-the-docs app implemented with a RAG pipeline and some LLM calls chained together using some framework like LangChain or A…
Compiler Testing — Part 1Coverage-Guided Fuzzing with Grammars and LLMs Compiler fuzzing for small languages is a specific problem — few optimization passes, tiny corpora, thin docs. This post covers how coverage-guided fuzzing and LLM-ass…
OpenAI Really Wants Codex to Shut Up About Goblins (www.wired.com via reddit)
OpenAI has a goblin problem. Instructions designed to guide the behavior of the company’s latest model as it writes code have been revealed to include a line, repeated several times, that specifically forbids it from randomly mentioning an…
Your Claude Code project dashboard is now on the Mac App Store (apps.apple.com via reddit)
Follow up to my .story/ post last week. The Mac companion is now live on the Mac App Store, free.
-
222 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 53m Opus 4.7 is just 4.6 with a stick up its butt. Give me my tokens back!
- 59m Claude Status Update : Elevated errors on Claude Opus 4.7 on 2026-04-29T00:00:29.000Z
- 2h Running Opus 4.7 for ops work: how do you keep per-task cost predictable?
- 4h Two new behaviors in Opus 4.7
- 8h Opus 4.7's New Tokenizer: What It Costs
THE "OBSERVER" INVARIANT AND CONTENT AUTOMATION (www.reddit.com)
Mnemostroma has reached version 1.11.0. We are moving away from the "chat history" model toward a professional-grade memory layer.
From spaghetti to main bus: refactoring an AI agent orchestrator with Elm (blog.mariohayashi.com via hn)
The Factory Must Grow (Part II): From Spaghetti AI Agent Orchestrator to a Main Bus tl;dr: In Part I, I built the factory: an orchestration system that runs AI agents like workers on a production line. Part II tears the original system dow…
To start, I have zero experience in coding. I know literally nothing but for the past 2 months I’ve been building a music recommendation app.
An open-source platform to auto-update agent skills and discover fresh sources (www.loooop.dev via hn)
GitHub obra/superpowers: An agentic skills framework & software development methodology that works. · GitHub GitHub obra/superpowers: An agentic skills framework & software develop… Loop autonomously monitors, evaluates, and updates your a…
claude-multiprofile Run multiple Claude accounts side by side on macOS. Personal and work, multiple clients, separate test accounts.
Ask HN: Can we just call them "Harness Gloves"?..and an App Store model? (news.ycombinator.com)
Mill Valley compound for sale. The price? Your Anthropic shares (sfstandard.com via hn)
For most people, owning a home is among life’s greatest milestones — especially in the Bay Area. For Storm Duncan, though, it is leverage to get in on the AI arms race.
pdf building tips! (www.reddit.com)
so i’m a casual user on the pro plan and mainly use it for writing, content ideas, and similar stuff so most weeks i don’t even hit my weekly limit. i’ve recently been working on a 50 page pdf workbook that people can print or use on their…
turboquant: on-device search and recommendation (www.reddit.com)
https://h3manth.com/ai/cinematch/ TurboQuant is Google Research’s new breakthrough quantization algorithm that applies random rotation to high-dimensional vectors to eliminate outliers, enabling extreme low-bit compression with near-zero a…