1. IBM just released Granite 4.1, a family of open source language models built specifically for enterprise use. Three sizes, Apache 2.0 licensed and trained on 15 trillion tokens with a level of pipeline obsession that's worth understanding.

  2. For example: - YouTube Tutorials - Newsletters - Blogs - Top Voices on social media For context: I’m planning to spend the long weekend playing around with Claude figuring out how to get the most out of the $20 subscription. Disclaimer: I’…

  3. A few weeks ago, I decided to stop overthinking and just start building. I ended up building 3 apps all using Anthropic’s Claude to speed up development, structure ideas, and iterate faster.

  4. I'm looking for a way to define todo list for my agents, mostly coding agents, so they will follow the list and do the job. Have you heard of such approach?

  5. Hi HN! We’re a team of ML validation specialists and we’ve been building /Spec27, a tool for testing whether AI agents still do their job safely and reliably as models, prompts, tools, and surrounding systems change.

  6. The Solution Aakash Is Looking For Already Exists By the Endo Team | February 2026 Aakash Japi at Tachyon published a piece this week with the headline “Sandboxes Won’t Save You From OpenClaw.” He’s right. And his diagnosis of why deserves…

  7. 32 items

    Claude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.

    model roundup

    Gemma 4
    135 items

    Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.

  8. Firefox for Web Developers: "Chrome looks set to ship an LL…" - Mastodon Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. mastodon.social is one o…

  9. Mozaik Mozaik is a TypeScript framework for building AI agents that share an agentic environment instead of being orchestrated through rigid pipelines. In Mozaik, humans, agents, observers, and tools are all Participants of the same Agenti…

  10. Curious how people see AI agents evolving beyond simple automation into real decision-making support. Will they mostly augment workflows or start replacing parts of knowledge work entirely?

  11. I’m from Tetr College, so pretty much everyone around me is building something. And somehow… they also keep stealing my API keys 😭 Jokes aside, I was looking at my invoices today and realized most of my spend is basically: 1/ Claude enterp…

  12. I've been going down a rabbit hole tinkering about what actually happens after you ship an LLM-powered app, and I'd love to hear how others here handle it… A few things I keep getting stuck on: Continuous optimization. Once your app is in…

  13. Hi, I’m having an issue with dates in a spreadsheet where they sometimes get mixed up between day and month. Claude excel thinks the problem is that when dates are added by the add-in, the system stores them in a way that doesn’t always ke…

  14. model roundup

    Qwen 3.5
    122 items

    Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.

    140 items

    Anthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.

  15. How much “Brain Damage” can an LLM Tolerate? Resistive Memory or Resistive RAM (RRAM), a type of random access memory based on memristors, is an area of research that is experiencing ever increasing interest because of its unique combinati…

  16. Using LLMs to find Python C-extension bugs This article brought to you by LWN subscribersSubscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and m…

  17. I have (free) access to a SLURM cluster with 8x NVIDIA A100 80GB GPUs (=640 GB VRAM) on a single task, and I want to run an open-weight model locally with llama.cpp for data generation, not coding. My use case is generating teacher data fo…

  18. We often focus on tokens/second and dollars/token, but rarely on the environmental cost per token. This work presents a step-by-step analysis of the energy and water footprint of LLM inference.

  19. GM announced earlier this week that it will upgrade 4 million vehicles with Gemini, Google's family of generative AI models. The rollout will occur over several months and include GM's four brands -- Chevrolet, GMC, Buick and Cadillac -- w…

  20. Some context up front, I've been using Claude to journal over the past few months and work through my thoughts. It's been really helpful and has led to meaningful insight about myself, my business, life in general.

  21. event

    Security
    104 items

    OpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.

    event

    Cowork
    129 items

    Issues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.

  22. Agentic Software Development Lifecycle For 50 years, software development has been a Craft: dependent on individual artisans, manual tooling, and implicit knowledge. We believe the next era of software engineering is Industrial.

  23. Hey HN -- I built Brifly because I was tired of reexplaining the same architecture to Claude Code/AI Agents every time. You can have a memory layer where you store everything about your company or project.

  24. Most AI agent examples I see are still centered around completing a task: call an API, write a report, summarize a doc, schedule something, update a database. That makes sense, but I keep wondering if we’re missing another kind of agent be…

  25. We’ve been working on a retrieval system for teams building AI agents in finance. (mainly around workflows that need to do in-depth web research).

  26. I found this article today: The MCP Era Feels Like Déjà Vu And, the authors basically argues that Anthropic will discover soon that MCPs are basically programming libraries repackaged. They explain what tool is through huggingface's smolag…

  27. I have been using chatgpt for a lot of deep research. It does tremendous work of actually going deep into a topic instead of giving a tl;dr version but often the sources are dated.