Hallucinated — page 8

Learning to Decide with AI Assistance under Human-Alignment (arxiv.org)

1d
Learning to Refine Hidden States for Reliable LLM Reasoning (arxiv.org)

1d
Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit (arxiv.org)

1d
Loss Landscape Poisoning: Targeted Extraction of Unseen Training Data from LLMs (arxiv.org)

1d
MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization (arxiv.org)

1d
MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors (arxiv.org)

1d

Large language model agents are increasingly integrated into map services. Since map services are embedded in everyday-life scenarios rather than professional task settings, users often express their needs informally, resulting in underspe…
MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks (arxiv.org)

1d
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision (arxiv.org)

1d
Membership Inference Attacks against Large Audio Language Models (arxiv.org)

1d
Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing So (arxiv.org)

1d

A robot's flash endurance is a non-renewable stock: every persisted write spends one of a few thousand program/erase cycles and never refills, yet no fielded robot memory system prices which memories are worth an erase cycle. We treat embo…
Multi-Adapter PPO: A Cross-Attention Enhanced Wavelength Selection Framework for LIBS Quantitative Analysis (arxiv.org)

1d
NarrativeWorldBench: A Frontier-Saturated Benchmark and a Latent World Model for Long-Horizon Co-Creative Audio Drama (arxiv.org)

1d

Long-form serialized audio drama, with arcs that run for 200 to 800 episodes, is a major creative medium and a setting where frontier large language models (LLMs) fail. We benchmark 21 models, spanning classical, fine-tuned, open-frontier,…
NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment (arxiv.org)

1d
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation (arxiv.org)

1d
OmniDrive: An LLM-Choreographed Multi-Agent World Model with Unified Latent Co-Compression for Multi-View Driving Video Generation (arxiv.org)

1d

Generative world models for autonomous driving face two unresolved tensions: heterogeneous control injection, where free-form language, HD-maps, trajectories, and camera poses reside in incompatible representational spaces, and post-hoc cr…
On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies (arxiv.org)

1d
Online LLM Selection via Constrained Bandits with Time-Varying Demand (arxiv.org)

1d

Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems to handle diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Selecting the appropriate LLM for each incoming task is critical f…
Optimism Stabilizes Thompson Sampling for Adaptive Inference (arxiv.org)

1d
PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents (arxiv.org)

1d
ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking (arxiv.org)

1d

End-to-end autonomous parking has emerged as a critical task within the realm of autonomous driving. However, existing methods suffer from black-box characteristics, lacking high-level semantic understanding and interpretability, which imp…
Physics-Informed Attention Mechanism and Generalization Capability of Deep Learning-Based Grain Growth Evolution Prediction (arxiv.org)

1d

Machine Learning (ML) models for grain growth prediction are typically trained on idealized synthetic data, yet practical applications require generalization to conditions outside the training distribution. This study evaluated the Out-Of-…
Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model (arxiv.org)

1d
Position: Modular Memory is the Key to Continual Learning Agents (arxiv.org)

1d
PreAct: Computer-Using Agents that Get Faster on Repeated Tasks (arxiv.org)

1d

Computer-using agents drive real software through the screen -- clicking and typing -- but they solve every task from scratch: asked to repeat a task, an agent re-reads the screen, re-reasons every tap, and pays the full cost again. We pre…
Prefill/Decode-Aware Evaluation of LLM Inference on Emerging AI Accelerators (arxiv.org)

1d

As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accelerators cl…
Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs (arxiv.org)

1d
Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval (arxiv.org)

1d
Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty (arxiv.org)

1d

Large language models can arrive at the same answer through reasoning paths that are unstable, contradictory, or difficult to rank consistently -- a failure mode especially prevalent in multi-step deductive reasoning. Existing methods asse…
R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model? (arxiv.org)

1d
Reconfigurable Computing Challenge: Transformer for Jet Tagging on Versal AI Engines (arxiv.org)

1d