Om Malik – What Microsoft's 10-Q Says About OpenAI (om.co via hn)
Subscribe to discover Om’s fresh perspectives on the present and future. Om Malik is a San Francisco based writer, photographer and investor.
- OpenAI and Microsoft (openai.com)
Why is RAG evaluation so hard in the real world? (www.reddit.com)
Evaluating RAG feels easy in theory, but production is a different challenge. We’ve been looking into why RAG benchmarking is such a moving target.
AI agencies scam ? (www.reddit.com)
There is word AI agents everywhere. Each company should use it.
Hi Folks, been working on something for a good few months. I created via GPT researcher a compiled list of data of peoples complaints across this subreddit.
Ai Doomsday Toolbox v0.938 (www.reddit.com)
Hello! It’s me again, the developer of ADT.
Show HN: Cloudwatch Insights tool for CLI users and agents (skagedal.tech via hn)
On building small CLI tools for myself – and now for my agents too. Walks through a recent one for querying CloudWatch Insights, and how I use Claude to analyze the logs it pulls down.
Mosaic: Local MCP server for agent memory (github.com via hn)
Mosaic Local MCP server for structured agent memory — hex lattice, hybrid retrieval, governed writes, and budgeted context — backed by HexxlaDB. Why Mosaic Mosaic keeps agent memory on infrastructure you operate: MCP on localhost, optional…
I just interviewed Michael Maximilien, former CTO at IBM and Chairperson of NodeJS Foundation, who spent a year shipping production RAG to multiple customers. His lesson was uncomfortable.
-
5 items
model roundup
GPT 5.3OpenAI has reportedly announced GPT-5.3-Codex-Spark, though the exact release date remains uncertain; meanwhile, users of chat API models like GPT-5.3-chat have noticed discontinuation with newer versions from OpenAI.
148 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 32m After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber
- 10h Rare W
- 11h Our evaluation of OpenAI's GPT-5.5 cyber capabilities
- 15h Anthropic: World is not ready for Mythos. Systems will break, Cybersecurity will be compromised. Its too dangerous to release. OpenAI:
- 17h GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation. One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost
Agentic Manifesto (apaydin.bearblog.dev via hn)
Agentic Manifesto When Karl Marx analyzed capitalism, one of his central ideas was surplus value. Profit comes from extracting more value from labor than workers receive in wages.
- What is agentic AI (www.reddit.com)
Most agent systems have prompts, tools, and memory, but no operating model. I just open-sourced a small kit built around a different assumption: treat the agent like a micro AI company.
Blog: AI evals are becoming the new compute bottleneck (evalevalai.com via reddit)
Hi! I wanted to share my new blog on the costs of running AI Evals.
I've always had the urge to have my two macbooks communicate. Having one idle while working on the other felt like underutilization of resources.
Giving Codex access to my MacBook/macOS (www.reddit.com)
Good idea or not really?
Base de donnée GitHub & Claude Pro (www.reddit.com)
Bonjour à tous et à toutes ! Je suis un nouvel addict à l'IA et Claude me plait énormément.
Extracting Signal from the Noise: What We Learned Auto-Triaging Agent PRs (huggingface.co via hn)
Research Article Template A modern, interactive template for scientific writing that brings papers to life. Interactive diagrams, math, citations, dark mode, PDF export - all with minimal setup.
Hey r/AI_Agents, Sharing something I am actively building right now. **The problem:** Businesses receive thousands of complaints daily.
-
141 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 39m Model stuck in some thinking zone where it keeps saying a similar thing again and again
- 4h Using a Radeon 9060 XT 16 GB, the gemma4 24b a4b iq4 nl model achieves 25.9 t/s
- 7h nvidia/Gemma-4-26B-A4B-NVFP4
- 9h What is best code editor for local LLM deployment (LM Studio, llama.cpp) as of May 2026?
- 9h Qwen 3.6 27B vs Gemma 4 31B - making Packman game!
124 itemsmodel roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 1h Running Qwen 35BA3B on a 16GB M3 Macbook Air at 8.9TPS!
- 1d Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models
- 1d Did anyone of you already make the "doomsday" or "offgrid" knowledge based? (ofc powered with LLM)
- 2d Which large models support tool use in opencode etc?
- 2d Poolside Laguna XS.2
Our agent found a bug with WireGuard in Google Kubernetes Engine (lovable.dev via hn)
How Lovable's infrastructure team tracked down sporadic networking errors in Kubernetes, from crashing anetd pods to MTU mismatches, using AI-assisted debugging and deep packet inspection.
Product Feedback: A "Docs" Tab for Claude Desktop (www.reddit.com)
TL;DR Claude Desktop's Code tab is excellent for developers, but the same underlying capability — Claude as a stateful, file-aware agent over a git-backed workspace — would unlock a much larger market if reframed for knowledge workers. A n…
Finally Claude Code has started respecting CLAUDE.md (www.reddit.com)
For the past 15 days I have noticed that Claude Code follows my instructions as it is from CLAUDE.md regarding any action which is specified in the file. Which is a huge improvement and while some people would disagree but I would rather u…
Docs REST API gRPC Pricing Search ⌘ K Toggle theme
- Grok 4.3 is out in the API (www.reddit.com)
- Grok hallucinations (www.reddit.com)
- Grok (www.reddit.com)
+2 more
- Where is Grok-2 Mini and Grok-3 (mini)? (www.reddit.com)
- Grok 4.3 Beta (grok.com via hn)
I lead marketing at a B2B integrations SaaS. We've been running a multi-agent setup for our content function for a few months now, including research, writer, fact-checker, critic, publisher, the usual chain.
96.8% of MCP tool descriptions don't warn the agent about destructive behaviour (policylayer.com via hn)
The State of MCP Security What 1,787 MCP servers can actually do to your systems. We classified every tool on every Model Context Protocol server we could enumerate from the public registries — 25,329 tools across 1,787 working servers.
RexIDE now has minimal "integration" with Codex App [video] (www.youtube.com via hn)
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
I have been building products in this space for 3-4 months now, but do not see any traction for them. I am curious as to the problems people are actually facing in this space, that is not solved to a satisfactory level by a competitor in t…
Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds o…
Advanced Quantization Algorithm for LLMs (github.com via hn)
Advanced Quantization Algorithm for LLMs English | 简体中文 User Guide | 用户指南 🚀 What is AutoRound? AutoRound is an advanced quantization toolkit designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).