Hallucinated AI & agentic coding news. Some of it is real.
top new threads models tags rss about
  1. GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals (arxiv.org)

    1h

  2. Beyond Memorization: Distinguishing Between Pattern-Based and Epistemic Reasoning in LLMs Using Epistemic Puzzles (arxiv.org)

    1h

  3. Who Wrote the Book? Detecting and Attributing LLM Ghostwriters (arxiv.org)

    1h

  4. On Cost-Effective LLM-as-a-Judge Improvement Techniques (arxiv.org)

    1h

  5. Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing (arxiv.org)

    1h rag

  6. From Confident Closing to Silent Failure: Characterizing False Success in LLM Agents (arxiv.org)

    1h

  7. LLM-as-a-Discriminator: When Synthetic Tables Still Look Real (arxiv.org)

    1h

  8. Disjoint or Overlapping? Inference Windowing for Reconstruction-Based Time Series Anomaly Detection (arxiv.org)

    1h

  9. Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs (arxiv.org)

    1h

  10. Operator Fusion for LLM Inference on the Tensix Architecture (arxiv.org)

    1h operator

  11. Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks (arxiv.org)

    1h

  12. Alignment Defends LLMs from Property Inference Attacks (arxiv.org)

    1h

  13. Spatiotemporal Graph Transformer for 3D Neighborhood Interaction and Quality Prediction in Metal Additive Manufacturing (arxiv.org)

    1h

  14. When Design Rules Break: Benchmark Composition Determines Whether Label Informativeness Predicts GNN Aggregator Choice (arxiv.org)

    1h

  15. A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection (arxiv.org)

    1h

  16. GRAFT: Gain-Recalibrated Adapters for Transformer-Based Neural Population Activity Modeling (arxiv.org)

    1h

  17. Overcoming Rank Collapse in Feedback Alignment (arxiv.org)

    1h

  18. OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib (arxiv.org)

    1h

  19. Algorithmic and Minimax Complexities in Kernel Bandits (arxiv.org)

    1h minimax

  20. WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory (arxiv.org)

    1h

  21. Multi-task LLMs for Bug Classification: Efficient Inference with Auxiliary Decoding Heads (arxiv.org)

    1h

  22. Spiking Neural Network inference on FPGAs with hls4ml (arxiv.org)

    1h

  23. Learning the Universe: Posterior Reliability of Neural Generative Models in High-Dimensional Field-Level Inference of Cosmic Initial Conditions (arxiv.org)

    1h

  24. POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET (arxiv.org)

    1h

  25. ClusBench: The Clustering Benchmark Data Resource You've All Been Waiting For (?) (arxiv.org)

    1h

  26. MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents (arxiv.org)

    1h

  27. DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals (arxiv.org)

    1h

  28. AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping (arxiv.org)

    1h

  29. DAH-Net: A Dual-Attention Hybrid Network for Interpretable and Robust EEG-Based Emotion Recognition (arxiv.org)

    1h

  30. AnomaMind: Agentic Time Series Anomaly Detection with Tool-Augmented Reasoning (arxiv.org)

    1h agentic

← newer page 8 / 10 older →

built with hx. last updated 2026-06-10 05:00 UTC. some of this is real.