1. This question shows research effort; it is useful and clear -17 This question does not show any research effort; it is unclear or not useful Save this question. Show activity on this post.

  2. There's a clever way to use Fable (or Opus!), for debugging AI agent behavior. Only applies if you're building automated LLM pipelines and Agentic workflows.

  3. See what your AI coding agent is doing with Datadog Lapdog Datadog Lapdog is a free tool that gives you real-time visibility into what your AI coding agents are doing. Here's how to install it, pair it with Claude Code, and drill into a re…

  4. Anthropic's Advanced AI Framework Read the advanced AI Safety Framework → We are publishing our proposal for how governments should address catastrophic risks from the most powerful AI models, including granting them the legal authority to…

  5. Hi! I am a student from India, this is my first paper that I published.

  6. Unlike many people suspect, Anthropic does intend to include Fable permanently in monthly plans. They said it will depend on available compute but when they eventually do have enough compute it’s what they’re planning on doing.

  7. For me, programming is the iterative process of discovering what a problem is actually asking for. Success occurs when the mechanisms of the solution, the state of the system, the observed outputs, and the intended outcome all align under…

  8. paywalled

  9. https://preview.redd.it/ns2tq8c0s40h1.png?width=2878&format=png&auto=webp&s=29f60dbf1519bd6df06a5163ac2ece188f2b575b Sometimes LLM hallucinations are really funny. One time I worked in Cursor on IDK what kind of task.

  10. event

    Copilot
    391 items

    Microsoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.

    model roundup

    Opus 4.8
    102 items

    Claude AI has released Opus 4.8, an upgrade to their Opus class of models available in version 2.1.154 of their software on March 16, 2023, which includes enhanced coding and professional task capabilities along with improved judgment and honesty. Users are reporting usage resets following the update.

  11. I’ve been noticing this disturbing trend for quite a while. By high karma I mean well over 1000 karma.

  12. We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a s…

  13. Scientific AI agents increasingly retrieve evidence, reason across sources, and synthesize conclusions used in consequential decisions. Yet, their ability to do so in high-stakes domains such as health remains unclear.

  14. In hierarchical reasoning, failures often originate at intermediate decision points where the agent commits to a wrong branch without recognizing that it lacks critical information. Rather than treating clarification as an external uncerta…

  15. Pre-mediation, the preparatory phase preceding direct human negotiation, plays a critical role in achieving mutually beneficial agreements, yet is often omitted due to cost, time, and limited access to trained mediators. We introduce an au…

  16. Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving i…

  17. Autoresearch agents now propose, evaluate, and select scientific candidates against a metric, and that metric is usually an aggregate reduced over a heterogeneous space of regions, slices, or cohorts. We show that when scientific validity…

  18. Financial and tabular question answering requires more than fluent reasoning: answers must be grounded in the exact facts, formulas, units, signs, and scales that support them. A single misread cell or incorrect operation can silently prod…

  19. Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclos…

  20. Reinforcement learning typically improves multi-turn agent capabilities through the terminal outcome of the trajectories, which makes it difficult to determine credit assignments for each intermediate turns. Recent on-policy self-distillat…

  21. The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window…

  22. Diagnosing pulmonary diseases requires integrating heterogeneous evidence amid phenotypic variability and cross-disease overlap. Although large language models (LLMs) have shown progress on pulmonary knowledge question answering (QA) and i…