Hallucinated — page 4

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems (arxiv.org)

19h

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention.
Structured Inference with Large Language Gibbs (arxiv.org)

19h

The knowledge encoded in large language models (LLMs) can serve as a substrate for structured reasoning over variables describing a complex world, but accessing this knowledge in a probabilistically coherent manner poses a difficult infere…
Structured Representation Learning with Locally Linear Embeddings and Adaptive Feature Fusion (arxiv.org)

19h

Neuroscientific research has revealed that the brain encodes complex behaviors by leveraging structured, low-dimensional manifolds and dynamically fusing multiple sources of information through adaptive gating mechanisms. Inspired by these…
TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization (arxiv.org)

19h

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough…
TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning (arxiv.org)

19h

Safe coordination in networked cyber-physical systems forces learning algorithms to simultaneously handle hybrid discrete-continuous actions, hard training-time safety constraints, and physics-governed dynamics. We show that these three fe…
The Illusion of Improvement: Reject Inference Strategies in Credit Scoring (arxiv.org)

19h

Reject inference methods are widely used to mitigate survival bias in credit scoring, yet their effectiveness remains poorly understood. We systematically evaluate several such methods and uncover a structural failure mode: in a natural re…
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs (arxiv.org)

19h

When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, unders…
The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs (arxiv.org)

19h

Warning: This paper studies stereotypes and biases, and contains potentially disturbing examples, used for illustration purposes only. Our findings should not be interpreted as an argument against alignment.
ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets (arxiv.org)

19h

The search for life beyond Earth will depend on detecting faint signatures in the atmospheres of potentially habitable exoplanets. Interpreting those signatures requires understanding the host planet's climate: the same molecule may signal…
TopBench: A Benchmark for Implicit Predictive Reasoning in Tabular Question Answering (arxiv.org)

19h

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries is implicitly predictive, requiring the…
Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications (arxiv.org)

19h

Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to domain-specific…
Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA (arxiv.org)

19h

The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical do…
Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory (arxiv.org)

19h

Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored…
TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches (arxiv.org)

19h

Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method de…
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology (arxiv.org)

19h

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce Therapeutic…
VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset (arxiv.org)

19h

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in…
VISUALSKILL: Multimodal Skills for Computer-Use Agents (arxiv.org)

19h

Computer-use agents (CUAs) approach human-level performance on standardised benchmarks but still struggle on long-horizon tasks and unseen software. Existing skill libraries address this with reusable skills, but represent the skill artifa…
WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks (arxiv.org)

19h

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g.,…
Zero-Shot Active Feature Acquisition via LLM-Elicitation (arxiv.org)

19h

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisiti…
scGTN: Deep Siamese Graph Transformer Network for Single-cell RNA Sequencing Clustering (arxiv.org)

19h

Single-cell RNA sequencing (scRNA-seq) serves a pivotal role in characterizing gene expression at the cellular level, enabling the identification of cell types and advancing the understanding of cellular heterogeneity. Despite the signific…
Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications (arxiv.org)

1d hallucination agentic

Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premat…
ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents (arxiv.org)

1d model-context-protocol mcp

Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually test wheth…
SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents (arxiv.org)

1d prompt-injection security
A Framework for Evaluating Agentic Skills at Scale (arxiv.org)

1d agentic
A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction (arxiv.org)

1d fine-tuning

The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance prediction remai…
A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions (arxiv.org)

1d agentic
AIMER: Calibration-Free Task-Agnostic MoE Expert Pruning (arxiv.org)

1d moe
Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3 (arxiv.org)

1d agentic

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifica…
Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search (arxiv.org)

1d agentic

Test-time scaling for agentic search typically increases depth (i.e., more turns and tokens per trajectory) or breadth (i.e., more parallel rollouts). Here we focus on breadth scaling, showing that standard parallel sampling yields diminis…
CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science (arxiv.org)

1d agentic

The Coupled Model Intercomparison Project Phase 6 (CMIP6) has generated thousands of peer-reviewed publications documenting model configurations, evaluation procedures, emergent constraints, and projection uncertainties. As the community t…