Hallucinated — page 10

Unlocking LLM Code Correction with Iterative Feedback Loops (arxiv.org)

1d

Large Language Models have shown remarkable capabilities in code generation. However, most existing evaluations focus only on single-attempt accuracy and overlook the iterative refinement process that is central to real-world programming.
Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems (arxiv.org)

1d
Visored: A Controlled-Natural-Language Prover for LLM-Generated Mathematics (arxiv.org)

1d

We present a dependent-type-based prover designed around the way LLMs (and humans) tend to write mathematics, complementing existing systems such as Lean and Rocq. Its core design choices are a surface that imitates mathematical natural la…
Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement (arxiv.org)

1d
Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models (arxiv.org)

1d

Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that reliability follo…
Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search (arxiv.org)

1d
When LLMs Analyze Scars: From Images to Clinically-Meaningful Features (arxiv.org)

1d
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval (arxiv.org)

1d

Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studi…
Would a Large Language Model Pay Extra for a View? Inferring Willingness to Pay from Subjective Choices (arxiv.org)

1d
m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning (arxiv.org)

1d
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks (arxiv.org)

2d prompt-injection security agentic
AutoDojo: Adaptive Attacks Expose Superficial Defenses and User-Underspecification Limits in LLM Agents (arxiv.org)

2d prompt-injection security

Indirect prompt injection (IPI) is a major security threat to LLM-powered agents. Thus, a growing body of work have proposed a variety of defensive approaches against IPI.
Automated jailbreak attack targeting multiple defense strategies (arxiv.org)

2d jailbreak security
Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning (arxiv.org)

2d fine-tuning security
Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment (arxiv.org)

2d prompt-injection security

Indirect prompt injection attacks hijack LLM-based agents by embedding malicious instructions in third-party data that the agent retrieves during task execution. Existing defenses report near-zero attack success rate on static benchmarks,…
DoubtProbe: Black-Box Jailbreak Defense via Structural Verification and Semantic Auditing (arxiv.org)

2d jailbreak security
Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains (arxiv.org)

2d haiku opus
LLM-as-Code Agentic Programming for Agent Harness (arxiv.org)

2d hallucination agentic

Every major LLM agent framework gives the LLM the role of orchestrator; the model decides what to do next, when to call tools, and when to stop. We argue that token explosion, control-flow hallucination, and unreliable completion are not i…
- Code as Agent Harness (code-as-harness.github.io via hn)
- Code as Agent Harness (arxiv.org via hn)
MAGE-RAG: Multigranular Adaptive Graph Evidence for Agentic Multimodal RAG in Long-Document QA (arxiv.org)

2d rag agentic
OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models (arxiv.org)

2d openclaw agentic

Equipping Large Language Model (LLM) agents with effective skills is crucial for solving complex tasks in real-world systems like OpenClaw. In this work, we aim to develop a framework that automatically constructs such reusable skills to e…
TechRAG: Evidence-Gated Multimodal Agentic RAG for Technical Literature Reasoning (arxiv.org)

2d rag agentic
A Formal Framework for Declarative Agentic AI in Business Process Analysis (arxiv.org)

2d agentic

Agentic AI opens new opportunities for automating Business Process (BP), enabling autonomous decision-making and dynamic adaptation. However, realising this potential requires BP entities and their interactions to be defined with formal pr…
A Security Analysis of Long-Horizon Agentic AI Systems: Threats, Evaluation, and Framework Development (arxiv.org)

2d agentic

This paper presents a structured analysis of security challenges in long-horizon agentic AI systems. The study reviews existing threats, evaluation approaches, attack propagation mechanisms, and security frameworks.
A Spatio-Temporal Expert Prefetching Framework for Efficient MoE-based LLM Inference (arxiv.org)

2d moe
A Survey on Agentic Security: Applications, Threats and Defenses (arxiv.org)

2d agentic
A Unified Definition of Hallucination: It's The World Model, Stupid! (arxiv.org)

2d hallucination
ANCHOR: Error-Controlled Adaptive Numerical Correction for Neural Operator Time Marching (arxiv.org)

2d operator
ARB4WM: An Adversarial Robustness Benchmark for World Models in Continuous Control (arxiv.org)

2d agentic

World models are widely used in robotic and agentic engineering control systems due to their ability to learn latent dynamics for planning and decision-making. As these systems are increasingly deployed in safety-critical settings, underst…
Agentic Framework for Deep Learning workload migration via In-Context Learning (arxiv.org)

2d agentic

Translating deep learning models from PyTorch's flexible, object-oriented design to JAX's functional, stateless setup is usually a manual and error-prone task. Automated migration is challenging because Large Language Models (LLMs) struggl…
Agentic Reinforcement Learning for Search Misaligns Instruction-Tuning (arxiv.org)

2d agentic