Granite 4.1: IBM's 8B Model Matching 32B MoE (firethering.com via hn)
IBM just released Granite 4.1, a family of open source language models built specifically for enterprise use. Three sizes, Apache 2.0 licensed and trained on 15 trillion tokens with a level of pipeline obsession that's worth understanding.
What’s the best free resource to learn about Claude (from scratch)? (www.reddit.com)
For example: - YouTube Tutorials - Newsletters - Blogs - Top Voices on social media For context: I’m planning to spend the long weekend playing around with Claude figuring out how to get the most out of the $20 subscription. Disclaimer: I’…
A few weeks ago, I decided to stop overthinking and just start building. I ended up building 3 apps all using Anthropic’s Claude to speed up development, structure ideas, and iterate faster.
Any Todo list for agents? (www.reddit.com)
I'm looking for a way to define todo list for my agents, mostly coding agents, so they will follow the list and do the job. Have you heard of such approach?
Show HN: Spec27 – Spec-driven validation for AI agents (www.spec27.ai via hn)
Hi HN! We’re a team of ML validation specialists and we’ve been building /Spec27, a tool for testing whether AI agents still do their job safely and reliably as models, prompts, tools, and surrounding systems change.
Response: Sandboxes Won't Save You from OpenClaw (endojs.org via hn)
The Solution Aakash Is Looking For Already Exists By the Endo Team | February 2026 Aakash Japi at Tachyon published a piece this week with the headline “Sandboxes Won’t Save You From OpenClaw.” He’s right. And his diagnosis of why deserves…
-
32 items
event
HallucinationClaude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.
135 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 21m Notes on what actually breaks when you run a coding agent on small local models
- 3h Comparing SVG Generation for the top open models
- 5h Based on what should I choose Gemma 4 models/quantizations?
- 9h Larger Gemma-4/Qwen3.6
- 12h If you could do anything with the local models in your corporate workflows, what would it be?
Chrome looks set to ship an LLM Prompt API to the web. We oppose this API (mastodon.social via hn)
Firefox for Web Developers: "Chrome looks set to ship an LL…" - Mastodon Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. mastodon.social is one o…
TypeScript framework for building non-blocking AI agents (github.com via hn)
Mozaik Mozaik is a TypeScript framework for building AI agents that share an agentic environment instead of being orchestrated through rigid pipelines. In Mozaik, humans, agents, observers, and tools are all Participants of the same Agenti…
How might AI agents transform knowledge work in the next decade? (www.reddit.com)
Curious how people see AI agents evolving beyond simple automation into real decision-making support. Will they mostly augment workflows or start replacing parts of knowledge work entirely?
what is your biggest startup expense? (www.reddit.com)
I’m from Tetr College, so pretty much everyone around me is building something. And somehow… they also keep stealing my API keys 😭 Jokes aside, I was looking at my invoices today and realized most of my spend is basically: 1/ Claude enterp…
I've been going down a rabbit hole tinkering about what actually happens after you ship an LLM-powered app, and I'd love to hear how others here handle it… A few things I keep getting stuck on: Continuous optimization. Once your app is in…
Claude Excel keeps messing up my dates (www.reddit.com)
Hi, I’m having an issue with dates in a spreadsheet where they sometimes get mixed up between day and month. Claude excel thinks the problem is that when dates are added by the add-in, the system stores them in a way that doesn’t always ke…
-
122 items
model roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 31m Reasoning Guard: Stopping LLM Thinking Loops at the Proxy Layer
- 1h Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models
- 2h Did anyone of you already make the "doomsday" or "offgrid" knowledge based? (ofc powered with LLM)
- 7h Qwen3.6-27B 4.256bpw in full VRAM on a 5070 Ti with 50000 q4_0 context - not turbo!
- 21h Sorry if it's not the best place to ask this, of the models in the image, which is the best for (problem solving)/Coding and the best one for studying (ask LLM concepts) ? My PC build is RX 9060 XT 16GB + I3 12100F + 16 GB DDR4 + llama.cpp with Vulkan backend + Linux Mint.
140 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 1h White House Opposes Anthropic's Plan to Expand Access to Mythos Model
- 13h what is claude mythos doing in my azure model catalog 😭
- 15h Trump officials draft plan to bring Anthropic back amid Pentagon fight
- 21h Claude Mythos Has Found 271 Zero-Days in Firefox
- 1d What Anthropic's Mythos means for the future of cybersecurity
How much "Brain Damage" can an LLM Tolerate? (2024) (hawaii.ziti.uni-heidelberg.de via hn)
How much “Brain Damage” can an LLM Tolerate? Resistive Memory or Resistive RAM (RRAM), a type of random access memory based on memristors, is an area of research that is experiencing ever increasing interest because of its unique combinati…
Using LLMs to find Python C-extension bugs (lwn.net via hn)
Using LLMs to find Python C-extension bugs This article brought to you by LWN subscribersSubscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and m…
- Using LLMs to find Python C-extension bugs (lwn.net via hn)
I have (free) access to a SLURM cluster with 8x NVIDIA A100 80GB GPUs (=640 GB VRAM) on a single task, and I want to run an open-weight model locally with llama.cpp for data generation, not coding. My use case is generating teacher data fo…
Show HN: Token Thermodynamics (mybinder.org via hn)
We often focus on tokens/second and dollars/token, but rarely on the environmental cost per token. This work presents a step-by-step analysis of the energy and water footprint of LLM inference.
GM Adds Google Gemini for Drivers to Rev Up with AI Assistant (www.cnet.com via hn)
GM announced earlier this week that it will upgrade 4 million vehicles with Gemini, Google's family of generative AI models. The rollout will occur over several months and include GM's four brands -- Chevrolet, GMC, Buick and Cadillac -- w…
Claude changing subject rather than going deeper (www.reddit.com)
Some context up front, I've been using Claude to journal over the past few months and work through my thoughts. It's been really helpful and has led to meaningful insight about myself, my business, life in general.
-
104 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 1h Estimating Black-Box LLM Parameter Counts via Factual Capacity
- 2h I audited LangChain’s core library and found 10+ Prompt Injection vulnerabilities. Here is the technical breakdown.
- 10h InfoSec To Integrate Claude Enterprise for Org
- 15h Probes trace an emergent jailbreak in OLMo 2 to mislabeled training data
- 16h Try to break my prompt injection detector — I’ll respond to every bypass attempt
129 itemsevent
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
The Agentic Software Development Life Cycle Framework (asdlc.io via hn)
Agentic Software Development Lifecycle For 50 years, software development has been a Craft: dependent on individual artisans, manual tooling, and implicit knowledge. We believe the next era of software engineering is Industrial.
Show HN: Brifly – stop re-explaining your codebase to Claude Code every week (www.getbrifly.com via hn)
Hey HN -- I built Brifly because I was tired of reexplaining the same architecture to Claude Code/AI Agents every time. You can have a memory layer where you store everything about your company or project.
Most AI agent examples I see are still centered around completing a task: call an API, write a report, summarize a doc, schedule something, update a database. That makes sense, but I keep wondering if we’re missing another kind of agent be…
We’ve been working on a retrieval system for teams building AI agents in finance. (mainly around workflows that need to do in-depth web research).
Anthropic is discovering that MCP is basically libraries repackaged (www.reddit.com)
I found this article today: The MCP Era Feels Like Déjà Vu And, the authors basically argues that Anthropic will discover soon that MCPs are basically programming libraries repackaged. They explain what tool is through huggingface's smolag…
I have been using chatgpt for a lot of deep research. It does tremendous work of actually going deep into a topic instead of giving a tl;dr version but often the sources are dated.