Hallucinated — page 5

Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty (arxiv.org)

1d operator
EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning (arxiv.org)

1d agentic
Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue (arxiv.org)

1d fine-tuning
Generalization Guarantees for Multi-Input Neural Operator Learning in Sobolev Spaces (arxiv.org)

1d operator
Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning (arxiv.org)

1d operator

Neural operators provide fast surrogates for PDEs but their deterministic predictions limit their use in tasks requiring uncertainty quantification (UQ), especially under geometric variability. Existing approaches primarily model uncertain…
Gradual Fine-Tuning for Flow Matching Models (arxiv.org)

1d fine-tuning
How Inference Compute Shapes Frontier LLM Evaluation (arxiv.org)

1d tool-use

AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool use and iterative problem solving. As a result, performance is increasingly sensitive to the amount and allocation of compute available at…
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation (arxiv.org)

1d fine-tuning
Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety (arxiv.org)

1d agentic
Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure (arxiv.org)

1d agentic

Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physi…
Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget (arxiv.org)

1d minimax
LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI (arxiv.org)

1d hallucination

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal…
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation (arxiv.org)

1d rag

While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines oft…
MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs (arxiv.org)

1d moe

Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs) offer remarkable performance but incur prohibitive GPU memory costs, making compression essential. Among PTQ methods, expert-level mixed-precision quantization has proven effe…
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation (arxiv.org)

1d agentic
Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation (arxiv.org)

1d agentic

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate foreca…
Operator Boosting Produces Pareto-Efficient PDE Surrogates (arxiv.org)

1d operator
PACE-RAG: Patient-Aware Contextual and Evidence-Constrained RAG for Clinical Drug Recommendation (arxiv.org)

1d rag
PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable J\"ackel Operator (arxiv.org)

1d operator

Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, a…
Perron--Frobenius Operator Matching for Generative Modeling (arxiv.org)

1d operator
Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences (arxiv.org)

1d rlhf
PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience (arxiv.org)

1d agentic

As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate…
Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering (arxiv.org)

1d fine-tuning

Open-weight video diffusion models can generate photorealistic unsafe content, from violence to misinformation, yet existing defenses either require expensive safety fine-tuning that degrades general capability, or apply external filters t…
Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models (arxiv.org)

1d qwen
S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices (arxiv.org)

1d operator
The Discrete-Log Clock: How a Transformer Learns Modular Multiplication (arxiv.org)

1d grok

When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies s…
TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins (arxiv.org)

1d fine-tuning

Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and naïve runs can even degrade model performance. This raises a practical que…
WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning (arxiv.org)

1d agentic

Language models are remarkably capable at medical question answering, in some cases surpassing the accuracy of general physicians. However, answering questions about wearable health data remains challenging and understudied, as these ubiqu…
X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation (arxiv.org)

1d fine-tuning
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models (arxiv.org)

1d agentic

AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts, leaving open…