1. OpenAI's ad platform has two halves. On the ChatGPT side, the backend injects structured single_advertiser_ad_unit objects into the conversation SSE stream while the model is responding.

  2. another day with pi + gemma 26B

  3. paywalled

  4. Last week we wrote about feeding terabytes of CI logs to an LLM. Most of the questions on Hacker News weren't about the logs.

  5. Agent, Know Thyself! (and bid accordingly) why we need to train models to learn their own capabilities, and how this will help them bid for work!

  6. Why don’t LLMs use explicit vector-based reasoning instead of language-based chain-of-thought? What would happen if they did?

  7. Been using Claude for a while, mostly at work, but finally decided to upgrade my personal account and spend more time learning about Claude Code. Had it up and running in a few hours.

  8. Many of us use agents to summarize tech blogs to stay updated. One day, I came across a previous Anthropic blog published on April 8th that had never been mentioned in my daily brief!

  9. Proxies, Sandboxes and Agent Security After my last post, I wanted to see how far I could take things. I have a home lab running in my office, where I have a bunch of different machines, and I run a combination of k3s and Ansible-provision…

  10. model roundup

    Opus 4.6
    79 items

    Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.

    model roundup

    Gemma 4
    129 items

    Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.

  11. Something shifted for me a few months ago. I stopped treating Claude like a search engine and started treating it like a collaborator.

  12. cursor solves one agent really well. one human + several agents in one repo, great loop.

  13. AWS and OpenAI are bringing the latest OpenAI models to Amazon Bedrock, launching Codex on Amazon Bedrock, and launching Amazon Bedrock Managed Agents, powered by OpenAI (all in limited preview), giving enterprises the frontier intelligenc…

  14. vLLM-compile: Bringing Compiler Optimizations to LLM Inference Luka Govedič vLLM Committer Senior Machine Learning Engineer, Red Hat 1

  15. been seeing a lot of "AI OS for companies". agent runtimes, MCP, the YC RFS, half the new yc batch.

  16. Imagine you’re building a legal-tech agent that can help with real-estate transactions. The v1 was a simple chat-with-the-docs app implemented with a RAG pipeline and some LLM calls chained together using some framework like LangChain or A…

  17. Compiler Testing — Part 1Coverage-Guided Fuzzing with Grammars and LLMs Compiler fuzzing for small languages is a specific problem — few optimization passes, tiny corpora, thin docs. This post covers how coverage-guided fuzzing and LLM-ass…

  18. OpenAI has a goblin problem. Instructions designed to guide the behavior of the company’s latest model as it writes code have been revealed to include a line, repeated several times, that specifically forbids it from randomly mentioning an…

  19. Follow up to my .story/ post last week. The Mac companion is now live on the Mac App Store, free.

  20. model roundup

    Opus 4.7
    222 items

    Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.

  21. Mnemostroma has reached version 1.11.0. We are moving away from the "chat history" model toward a professional-grade memory layer.

  22. The Factory Must Grow (Part II): From Spaghetti AI Agent Orchestrator to a Main Bus tl;dr: In Part I, I built the factory: an orchestration system that runs AI agents like workers on a production line. Part II tears the original system dow…

  23. To start, I have zero experience in coding. I know literally nothing but for the past 2 months I’ve been building a music recommendation app.

  24. GitHub obra/superpowers: An agentic skills framework & software development methodology that works. · GitHub GitHub obra/superpowers: An agentic skills framework & software develop… Loop autonomously monitors, evaluates, and updates your a…

  25. claude-multiprofile Run multiple Claude accounts side by side on macOS. Personal and work, multiple clients, separate test accounts.

  26. For most people, owning a home is among life’s greatest milestones — especially in the Bay Area. For Storm Duncan, though, it is leverage to get in on the AI arms race.

  27. so i’m a casual user on the pro plan and mainly use it for writing, content ideas, and similar stuff so most weeks i don’t even hit my weekly limit. i’ve recently been working on a 50 page pdf workbook that people can print or use on their…

  28. https://h3manth.com/ai/cinematch/ TurboQuant is Google Research’s new breakthrough quantization algorithm that applies random rotation to high-dimensional vectors to eliminate outliers, enabling extreme low-bit compression with near-zero a…