Terms of Service Ban AI Agents from Using Stack Overflow for Agents (meta.stackoverflow.com via hn)
This question shows research effort; it is useful and clear -17 This question does not show any research effort; it is unclear or not useful Save this question. Show activity on this post.
Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude (www.wired.com via hn)
My favorite use-case for Fable (www.reddit.com via reddit)
There's a clever way to use Fable (or Opus!), for debugging AI agent behavior. Only applies if you're building automated LLM pipelines and Agentic workflows.
See what your AI coding agent is doing with Datadog Lapdog (chrisebert.net via hn)
See what your AI coding agent is doing with Datadog Lapdog Datadog Lapdog is a free tool that gives you real-time visibility into what your AI coding agents are doing. Here's how to install it, pair it with Claude Code, and drill into a re…
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (decrypt.co via hn)
Anthropic being a good citizen or pushing ideology? (www.anthropic.com via hn)
Anthropic's Advanced AI Framework Read the advanced AI Safety Framework → We are publishing our proposal for how governments should address catastrophic risks from the most powerful AI models, including granting them the legal authority to…
Tiny Scale Is All I Can Spare To Play With Transformer (doi.org via reddit)
Hi! I am a student from India, this is my first paper that I published.
Unlike many people suspect, Anthropic does intend to include Fable permanently in monthly plans. They said it will depend on available compute but when they eventually do have enough compute it’s what they’re planning on doing.
Sestriere: Native MeshCore LoRa Mesh Client for Haiku OS (github.com via hn)
Agent Harness Benchmarking (www.reddit.com via reddit)
For me, programming is the iterative process of discovering what a problem is actually asking for. Success occurs when the mechanisms of the solution, the state of the system, the observed outputs, and the intended outcome all align under…
- Code as Agent Harness (arxiv.org via hn)
- Code as Agent Harness (code-as-harness.github.io via hn)
- Agent Harness Engineering (twitter.com via hn)
+5 more
- Agent Harness Engineering (www.oreilly.com via hn)
- Agent Harness Engineering (addyosmani.com via hn)
- What should the benchmark for a harness agent be? (www.reddit.com)
- The Anatomy of an Agent Harness (www.langchain.com via hn)
- Shared Agent Harness (github.com via hn)
OpenAI considers drastic price cuts, anticipating war for users with Anthropic (www.reuters.com via hn)
paywalled
- OpenAI Considers Drastic Price Cuts, Anticipating War for Users with Anthropic (www.wsj.com via hn)
Cursor granted me a calculator while hallucinated (www.reddit.com via reddit)
https://preview.redd.it/ns2tq8c0s40h1.png?width=2878&format=png&auto=webp&s=29f60dbf1519bd6df06a5163ac2ece188f2b575b Sometimes LLM hallucinations are really funny. One time I worked in Cursor on IDK what kind of task.
If LLMs are all persona, whose persona are they? (persona.earthpilot.ai via hn)
-
391 items
event
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
102 itemsmodel roundup
Opus 4.8Claude AI has released Opus 4.8, an upgrade to their Opus class of models available in version 2.1.154 of their software on March 16, 2023, which includes enhanced coding and professional task capabilities along with improved judgment and honesty. Users are reporting usage resets following the update.
- 1h Tell HN: Anthropic's Fable model is too expensive
- 5h Thanks Fable-5, I'm Flattered
- 8h It blocked us at 'hello!' Anthropic Fable 5 refusing innocuous prompts
- 8h Critique my prompt
- 8h Fable 5 and the 8 July privacy update landed the same week. Is the model launch pulling attention off the data changes, or am I overthinking it?
Application of Claude in government? (www.reddit.com via reddit)
Why are there an increasing number of outright unhinged high karma users on HN? (news.ycombinator.com)
I’ve been noticing this disturbing trend for quite a while. By high karma I mean well over 1000 karma.
We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a s…
Can AI Agents Synthesize Scientific Conclusions? (arxiv.org)
Scientific AI agents increasingly retrieve evidence, reason across sources, and synthesize conclusions used in consequential decisions. Yet, their ability to do so in high-stakes domains such as health remains unclear.
In hierarchical reasoning, failures often originate at intermediate decision points where the agent commits to a wrong branch without recognizing that it lacks critical information. Rather than treating clarification as an external uncerta…
Pre-mediation, the preparatory phase preceding direct human negotiation, plays a critical role in achieving mutually beneficial agreements, yet is often omitted due to cost, time, and limited access to trained mediators. We introduce an au…
Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving i…
Autoresearch agents now propose, evaluate, and select scientific candidates against a metric, and that metric is usually an aggregate reduced over a heterogeneous space of regions, slices, or cohorts. We show that when scientific validity…
Financial and tabular question answering requires more than fluent reasoning: answers must be grounded in the exact facts, formulas, units, signs, and scales that support them. A single misread cell or incorrect operation can silently prod…
Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclos…
Reinforcement learning typically improves multi-turn agent capabilities through the terminal outcome of the trajectories, which makes it difficult to determine credit assignments for each intermediate turns. Recent on-policy self-distillat…
The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window…
Diagnosing pulmonary diseases requires integrating heterogeneous evidence amid phenotypic variability and cross-disease overlap. Although large language models (LLMs) have shown progress on pulmonary knowledge question answering (QA) and i…