Hallucinated — page 7

Disentangling Perception and Reasoning in Multimodal LLMs via Reward Design (arxiv.org)

1d
Dissecting model behavior through agent trajectories (arxiv.org)

1d

AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses.
Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes (arxiv.org)

1d

Large language models have accelerated the transition from passive conversational assistants to autonomous agents that can understand goals, plan actions, invoke tools, and execute multi-step tasks. Yet the capability of a single agent rem…
Divide, Deliberate, Decide: A Multi-Agent Framework for Fine-Grained Egocentric Action Recognition (arxiv.org)

1d

Fine-grained action recognition in egocentric video is challenging for Vision-Language Models (VLMs): actions often differ only in small visual cues, and a single model tends to be biased toward a subset of these cues. We propose Divide, D…
Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection (arxiv.org)

1d
EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent (arxiv.org)

1d

As LLM-based shopping agents enter production, existing benchmarks fail to capture how a shopper's requirements arrive: stated implicitly in the query, recorded in a profile, or revealed only when the right question is asked. Benchmarks th…
ED3R: Energy-Aware Distributed Disaster Detection Enabled by Cooperative Robotic Agents (arxiv.org)

1d
EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning (arxiv.org)

1d
Environment-Grounded Automated Prompt Optimization for LLM Game Agents (arxiv.org)

1d
Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports (arxiv.org)

1d
Evaluating Second-Order Bias of LLMs Through Epistemic Entitlement (arxiv.org)

1d
Extracting Semantics: LLM-Guided Automatic Population of Robot Ontology from URDF (arxiv.org)

1d

While commonsense knowledge may suffice for virtual agents, embodied robots interacting with humans require grounded and semantically rich representations of both their environment and their own physical embodiment. In cognitive robotics,…
FacProcessTwin: An LLM-Based System for Process Twin Development (arxiv.org)

1d

Process twins provide real-time representations of entire production processes. By capturing how process steps interact, rather than monitoring a single machine in isolation as an asset-based digital twin does, they have the potential to d…
FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback (arxiv.org)

1d
From Brewing to Resolution: Tracing the Internal Lifecycle of Code Reasoning in LLMs (arxiv.org)

1d

Standard accuracy metrics cannot explain why LLMs handle variable tracking but fail on semantically equivalent loops. We study an internal lifecycle of code reasoning in which models first brew the answer, making it linearly recoverable ma…
From Drift to Coherence: Stabilizing Beliefs in LLMs (arxiv.org)

1d
From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities (arxiv.org)

1d
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning (arxiv.org)

1d
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity (arxiv.org)

1d
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? (arxiv.org)

1d
IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction (arxiv.org)

1d
Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems (arxiv.org)

1d

Large language models (LLMs) are becoming a major way for consumers to find products, but we do not yet understand how brands compete in this new channel. We study brand dynamics in LLM recommendations using skincare products -- a category…
Instrumental and Proximal Causal Inference with Gaussian Processes (arxiv.org)

1d
Kernel-Based Functional Balancing for Causal Inference with Compositional Treatments (arxiv.org)

1d
LLM Consumer Behavior Theory: Foundations of a Novel Research Field (arxiv.org)

1d

Large language models (LLMs) are increasingly deployed as autonomous agents that make consumption decisions on behalf of users. This shift raises fundamental questions for consumer theory, which has traditionally modeled humans as the prim…
LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks (arxiv.org)

1d

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rat…
LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management (arxiv.org)

1d
LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline (arxiv.org)

1d

Generative AI and large language models (LLMs) are increasingly applied to question generation and automated assessment. However, deploying LLMs in preparation for high-stakes exams requires more than prompt engineering; it demands softwar…
LLMs Infer Cultural Context but Fail to Apply It When Responding (arxiv.org)

1d
Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents (arxiv.org)

1d