#rag

488 items

Please don't spam people looking for employment. It's just cruel (news.ycombinator.com) +393100 3w

Earlier I posted in a “Who wants to be hired?” thread, looking for a place where I could apply my experience in hospitality, food tech and automation. A couple hours later I received an email: “Hi Ilia, I saw your comment on the June Who’s…

rag
Vibe Coding vs. Production reality (www.reddit.com) +11115 7w

The image is from X, been thinking about it since I saw it. Vibe coding is real.

rag
Taught my 60-year-old dad (zero coding exp) Claude and Git in Feb. Today he built a RAG solution. I finally get "vibe coding." (www.reddit.com) +10424 8w

My father teaches geology and has literally zero coding expertise. Back in February, I introduced him to Claude and taught him the absolute basics of how Git works.

rag claude-code
Hot take: the biggest bottleneck in AI agents right now isn't models, frameworks, or even cost. It's that nobody knows how to properly evaluate if their agent is actually working (www.reddit.com) +8429 9w

tool-calling rag
How do you usually get around when starting big projects in Claude Code? (www.reddit.com) +4820 6w

This question will probably make more sense when I explain my current situation: lately I’ve been doing some small projects here and there to some small business in my town and they have been working fine, but that is about to change. I ma…

rag claude-code
Bible as RAG Database (www.crosscanon.com via hn) +4516 1d

Cross Canon Leave blank to search all indexed books. Loading scripture text...

rag
Title: I’m tired of the "Agent Hype"—Most AI agents right now are just expensive loops. Change my mind (www.reddit.com) +3839 9w

We’ve all seen the flashy demos, but after spending the last few months trying to build [or use] actual multi-agent workflows, I’ve hit a wall. The "Loop of Death": Agents still get stuck in reasoning loops that burn tokens without solving…

rag chatgpt
Qwen 3.6: worse adherence? (www.reddit.com) +3727 10w

Just swapped Qwen 3.5 for the 3.6 variant (FP8, RTX 6000 Pro) using the same recommended generation settings. My stack is vLLM (v0.19.0) + Open WebUI (v0.8.12) in a RAG setup where the model has access to several document retrieval tools.

↯ Qwen 3.6 vllm rag qwen
Gemini API File Search is now multimodal (blog.google via hn) +282 6w

Gemini API File Search is now multimodal: build efficient, verifiable RAG Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata.

rag gemini
What's your favorite local MCP server? (www.reddit.com) +2744 4w

I've seen so many rag this, memory that projects. What projects are people actually using day to day for agentic workloads.

rag mcp agentic
Show HN: YourMemory, agentic memory is a pruning problem, not a hoarding problem (yourmemoryai.vercel.app via hn) +19 2w

This is a project that I have been building for a while now, YourMemory is a solution to agentic memory which focuses on pruning of noise rather than hoarding of data. In the current state of agentic memory most of the context is stored in…

rag agentic
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA (www.reddit.com) +19 4w

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 q…

↯ Sonnet 4.5 rag sonnet agentic
Haystack: Open-Source AI Framework for Production Ready Agents, RAG (haystack.deepset.ai via hn) +144 2d

The Open Source AI Framework for Production Ready Agents, RAG & Context Engineering Haystack Sets the Standard for Agentic AI Across Industries Why Teams Choose Haystack for their AI Workflows Build Transparent, Context Engineered AI Syste…

rag agentic
PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together (www.reddit.com) +143 8w

Previously a model could only be present in a single group. Now you can create whatever groups you want: one for big models that should run on their own, a group for STT + bigger model, a group for RAG usages, etc.

rag llama
governance wall in agentic workflows. why are we stuck past rag? (www.reddit.com) +1415 9w

keep seeing the same pattern across agent projects. we're good at building agents that find information, but the moment we ask them to actually do something (update a crm, trigger a payment, touch a production database), things grind to a…

rag agentic
Very detailed guide to building AI Agents? (www.reddit.com) +1315 9w

rag
(Rant ;)) Make your benchmarks realistic (www.reddit.com) +112 6w

Everybody here is posting their optimizations for running different models - thats good but make these benchmark realistic as speed is not one factor to run llm effectively. Context size is key - with agentic/coding/rag work you need to ha…

rag agentic
Curated a list of 550+ free or cheap AI tools for vibe coding (LLM APIs, IDEs, local models, RAG, agents) (www.reddit.com) +117 10w

Been vibe coding a lot recently and kept running into the same problem finding actually usable tools without paying for 10 different subscriptions or donating my bank balance to Claude. So I put together a curated list focused on free or l…

ollama rag qwen+3
Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models? (www.reddit.com) +107 4w

I don't see any threads on this model. Is it because it's dense and/or without-reasoning?

↯ Qwen 3.6 retrieval-augmented rag
how do you guys handle the conversation with skeptical clients when selling agents? (www.reddit.com) +1011 5w

struggling with a bit of a reality check lately and wanted to see if anyone else is running into this. been pitching agentic workflows for a while, and I've realized that leading with the tech - the orchestration the RAG, the "intelligence…

rag agentic
Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers (www.reddit.com) +106 5w

I kept seeing inference-speed claims for these models and wanting an apples-to-apples comparison on the hardware I actually have. So I built a harness and a public page that dumps every run as YAML.

↯ Qwen 3.5 vllm moe gemma+1
Show HN: XTrace – Encrypted vector DB (search embeddings without exposing them) (github.com via hn) +102 9w

Hey everyone! This is XTrace.

vector-database rag
Evaluated a RAG chatbot and the most expensive model was the worst performer. Notes on what actually moved the needle. (www.reddit.com) +916 6w

We had a customer support RAG bot. Standard setup: ChromaDB, system prompt, an LLM doing generation.

rag
Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup? (www.reddit.com) +98 6w

So I've been going down a rabbit hole lately and I can't find many people actually talking about this specific use case. everyone here runs local LLMs for coding, chat, maybe some creative writing.

rag
OpenAI has announced they will be winding down fine tuning. (www.reddit.com) +92 7w

Got an email today about the announcement. > OpenAI is winding down the fine-tuning API and platform.

↯ Fine Tuning fine-tuning rag openai
Choosing a Mac Mini for local LLMs — what would YOU actually buy? (www.reddit.com) +930 9w

ollama openclaw rag+1
RAG on Snapdragon X2 Laptop, 200K documents. (www.reddit.com) +8 5w

Qualcomm recently released the new 𝐒𝐧𝐚𝐩𝐝𝐫𝐚𝐠𝐨𝐧 𝐗2 𝐥𝐚𝐩𝐭𝐨𝐩 𝐜𝐡𝐢𝐩𝐬𝐞𝐭. I immediately ordered one: ASUS Zenbook A16 16" 3K OLED Touchscreen Laptop — Snapdragon X2 Elite Extreme (2026) A few things I really like about this machine: 𝐄𝐱𝐭𝐫𝐞𝐦𝐞𝐥𝐲 𝐥𝐢𝐠𝐡𝐭.

rag
How are you actually using AI agents in real workflows right now? (www.reddit.com) +817 10w

I’m building some infrastructure around AI agents and I’m trying to understand how people are actually using them in real workflows, not demos. Specifically curious about: - What your agent actually does day-to-day (not hypotheticals) - Wh…

rag
Show HN: GlycemicGPT – Open-source AI-powered diabetes management (github.com via hn) +71 6w

I'm a Type 1 diabetic and software engineer. Last year I went months between endocrinologists with no clinician reviewing my data.

rag
Sanity check: using git to make LLM-assisted work accumulate over time (www.reddit.com) +714 9w

rag
Show HN: AI support chatbot with RAG and citations – one back end file, no infra (github.com via hn) +7 10w

Upload markdown docs, get a support chatbot that answers with citations. The entire backend is one JS file — storage, search, and conversation history are handled by the runtime.

rag
Struggling to balance high-volume orchestration (www.reddit.com) +79 10w

Working on a multi-agent system for a large outbound pipeline. We're running 100+ LinkedIn and email accounts, and simple linear automation (step A then step B) breaks down fast because real conversations don't move in a straight line.

rag
Training SID-1 to beat GPT-5 at search with 1k+ QPS RL (turbopuffer.com via hn) +6 5w

SID-1 is an agentic search model that is 24x faster than GPT-5.1-high, 374x cheaper than Sonnet 4.5, and achieves 1.9x higher recall than traditional RAG pipelines. Here's how we trained it using large-scale RL on turbopuffer.

↯ Sonnet 4.5 gpt-5 rag sonnet+1
How are you maintaining your AI apps post-launch? Model bugs vs engineering bugs, and what's your debugging stack? (www.reddit.com) +63 8w

I've been going down a rabbit hole tinkering about what actually happens after you ship an LLM-powered app, and I'd love to hear how others here handle it… A few things I keep getting stuck on: Continuous optimization. Once your app is in…

rag
Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs (www.reddit.com) +612 9w

rag
RAG/Retrieval as a solution (www.reddit.com) +66 10w

hi folks, I am new to the community and I have gone through the rules and I hope I am not breaking any of them with this post and will try to maintain 1/10 ratio. For building RAG, there are many tools out there each solving a piece of t…

rag
Any reason to run dense over MOE for RAGs? (www.reddit.com) +512 4w

I tend to use Claude for a lot of research and I also increasingly worry about things like misinformation or things in the model I can't audit. So, I'm building my own all in one RAG with big datasets like all of Wiki, research papers, all…

↯ Qwen 3.6 moe rag
Every week this we see some version of "how do I evaluate my LLM app?" and the answer almost always stops at RAGAS or DeepEval. Here is the part of the evaluation stack most tutorials skip in 2026. (www.reddit.com) +51 7w

The same question lands on this sub a few times a week, and the standard answers (RAGAS, DeepEval) are correct but stop one layer short of what you actually need once your app leaves a notebook. Wanted to lay out the full picture for anyon…

rag agentic
LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications (github.com via reddit) +56 7w

I've been pretty unsatisfied with web search options for local LLM/RAG systems. Most setups either rely on paid APIs like Brave, or meta search scrapers like SearXNG.

rag
Persistent memory system for LLMs that actually learns mid-conversation (www.reddit.com) +54 7w

Every LLM conversation starts from zero. RAG helps, but it can't learn from what's happening right now.

ollama rag mcp+2
I have built something using claude what I was doing on excel from last 13 years (www.reddit.com) +54 8w

I am doing financial modeling for the startups and feasibility reports for the new companies for more than a decade now, I started playing with Lovable 6 months ago, then somebody introduced me to the VSCode with claude, it’s like a superp…

rag
Are we overengineering RAG when the real problem is structure? (www.reddit.com) +58 8w

Lately I’ve been working on a few enterprise AI use cases, and one thing keeps coming up. We spend a lot of time trying to improve retrieval.

rag
Turning RAG pipelines into enterprise-grade Data Subscriptions (halcyon.io via hn) +5 10w

Back in September, we at Halcyon shared our plans to build five data subscriptions in the coming months. If you are reading here, you’ve probably been along the journey with us: from gas power plants, to large load tariffs, to utility rate…

rag
Total idiot needs some build advice (www.reddit.com) +512 10w

Looking for some advice here because I made a hasty purchase. "Cut your losses and move on" is totally a reasonable answer, but I figured I'd look for some additional help. So, I just started working on a local RAG pipeline with about 15,0…

rag
Show HN: Built a public demo to explore SpaceX's IPO filing using multimodal RAG (www.calypso.so via hn) +41 2w

Ask the SpaceX IPO filing like an analyst. Grounded across 84 indexed sources, including prospectus summaries, risk factors, MD&A, launch vehicle pages, Starlink materials, xAI/X references, charts, and image exhibits.

rag
RAG demo for New Zealand residential tenancy law (tenancy.localrun.ai via hn) +4 3w

This tool searches real NZ Tenancy Tribunal decisions published by the Ministry of Justice. Decision links point to NZLII.

rag
We reduced RAG retrieval cost 10× with a hippocampus-inspired memory substrate (www.bricbybric.ae via hn) +4 4w

We Built a Memory Engine. The Brain Told Us How.

rag
Show HN: Harbor v0.4.19 – harbor launch –back end vLLM –web codex (github.com via hn) +4 4w

https://github.com/user-attachments/assets/e4897391-c5a8-4391-93c3-9f8b76155f11 Setup your local LLM stack effortlessly. Starts fully configured Open WebUI and Ollama harbor up Now, Open WebUI can do Web RAG and TTS/STT harbor up searxng s…

vllm ollama rag+1
Your RAG is hallucinating because of garbage retrieval — here's the 3-line fix (with real scores) (www.reddit.com) +47 4w

My RAG agent hallucinated. Not because the LLM was bad — because the retrieval was feeding it noise.

rag
Show HN: I built a RAG and knowledge graph agent that runs locally (news.ycombinator.com) +41 4w

Claw-Coder is an AI agent that runs locally on your laptop and has access to powerful tools instead of configuring claude or codex to use a local model just use claw-coder. Why was claw-coder created?

rag codex cursor
I Kept a Diary for Seven Years. An LLM Finally Read It. (www.reddit.com) +49 5w

I've kept a personal diary since 2019. Last week I fed 200+ entries to an LLM and asked it how I've changed over 7 years.

↯ Fine Tuning fine-tuning rag
how would you set up a local llm server for a business of 7 people? (www.reddit.com) +419 5w

Okay so i've been stalking this sub for some time and i run the occasional small 2-8b model on my laptop (not the best) for fun but say my role at a company is to set up a local LLM since we obviously don't want confidential data going to…

↯ Qwen 3.6 gemma rag qwen
Show HN: An agent that tunes its own cache (news.ycombinator.com) +4 7w

The weekend of last week I built chat.betterdb.com as a RAG over Valkey/Redis/Dragonfly docs. The goal was to eat our own dogfood and test publicly our caching libraries.

rag openai
I made tiny AST tool for agent code exploration - No RAG, no index, no cache (www.reddit.com) +42 7w

A small tool I made for myself (ast-outline), sharing in case it's useful... still experimenting with it.

rag cursor claude-code
An Open Benchmark for Testing RAG on Realistic Company-Internal Data (www.reddit.com) +4 7w

We built a corpus of 500,000 documents simulating a real company, and then let RAG systems compete to find out which one is the best. Introducing EnterpriseRAG-Bench, a benchmark for testing how well RAG systems work on messy, enterprise-s…

rag
I'm looking for an AI Automation Engineer role or gig (news.ycombinator.com) +4 7w

Hi all, I'm an AI automation engineer who builds systems that replace manual work, scale outreach, and turn workflows into revenue. I have sent out working systems for managing leads to CRM, finding real estate deals, sorting emails with A…

rag agentic
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com) +48 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling ↯ Opus 4.7 function-calling fine-tuning rag+4
Is evaluating RAG retrieval using UI only useless? (www.reddit.com) +41 8w

Suppose that for now you only had access to the frontedn of a RAG system and you don't know how does the backend works, but you need to improve confidence of retrieved results. How do you design this process to be able to improve it?

rag
building a Multi-Agent AI App for automated Bill of Quantities. Need architecture/framework any advice! (www.reddit.com) +47 9w

rag
GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD? (www.reddit.com) +43 10w

Hey all, we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 o…

↯ Qwen 3.6 rag agentic
Plugging Claude into Obsidian for a RAG like system. (www.reddit.com) +43 10w

Hey so I am just going to make a post to see what almighty reddit has to say but I am trying to get claude to connect to an Obsidian vault so it can help me reference lecture notes, textbook theory, past claude convos, and projects and sof…

rag
Show HN: SynapseKit – Async-native Python framework for LLM pipelines and agents (github.com via hn) +42 10w

[]() Documentation · Quickstart · API Reference · Changelog · Discord · Report a Bug Build production LLM apps with 2 dependencies. Async-native RAG, Agents, and Graph workflows — no magic, no SaaS, no bloat.

rag
I open sourced a local-first LLM wiki for research and durable memory (www.reddit.com) +4 10w

I’ve been building a small tool called oamc around a workflow I wanted for personal research and long-running project memory. The basic idea is: instead of repeatedly querying raw notes/documents, sources get ingested into a maintained mar…

rag
Applied AI Implementation Engineer Freelance (news.ycombinator.com) +3 17h

Open to Work I build production AI systems that add intelligence to processes. My work includes Closed-Loop AI-native systems, RAG, AI agents, agentic evaluations, guardrails, and enterprise integrations using Python, TypeScript, React, No…

rag gemini agentic+1
Find the questions your RAG pipeline will fail on, before your users do (github.com via hn) +3 2d

RAGProbe Find the questions your RAG pipeline will fail on — before your users do. RAGProbe analyzes your chunk corpus topology (the graph of how chunks relate to each other in embedding space) and generates adversarial questions targeting…

rag
Show HN: Vedana – open-source RAG over a knowledge graph (github.com via hn) +3 3d

Vedana Open-source multi-agent RAG over a knowledge graph. Instead of guessing answers from text similarity, Vedana agents navigate a typed graph step by step — issuing Cypher queries, running vector search, verifying sources, and assembli…

rag
A PostgreSQL Database for Every Agent: In-Database RAG, Graph, and Multitenancy (www.yugabyte.com via hn) +3 7d

Discover newly released YugabyteDB 2026.1 and YugabyteDB AMP (Agentic Multitenant PostgreSQL): a true serverless, scale-to-zero PostgreSQL where every agent gets its own real, isolated database starting at a fraction of the cost of a core.…

rag agentic
Show HN: Coding agent with algebraic memory (VSA) instead of RAG (github.com via hn) +3 11d

Raidho ᚱ A coding agent that plans with one model, executes with another, and remembers what it learns. Most coding agents are one model in a tool loop.

rag
Show HN: Local RAG memory system that AI can write directly to (github.com via hn) +31 12d

For me and my family, it get really annoying having to reshare information each time you create a new LLM chat. Therefore, I decided to create local-memory-mcp, a local MCP that allows LLMs to read and write to a RAG.

rag mcp
Building a Personal RAG Chatbot in a Few Days (e-mahmoudi.me via hn) +31 2w

Building a Personal RAG Chatbot in a Few Days: Learning by Engineering How I built a small personal RAG chatbot using FastAPI, PostgreSQL, and Docker as a practical engineering exercise. Building a Personal RAG Chatbot in a Few Days: Learn…

rag
Show HN: A 150M model that extracts verbatim evidence spans for RAG, no LLM call (huggingface.co via hn) +3 2w

Verbatim-RAG Extractor Chill, I Ground! 🌶️ Model Name: verbatim-rag-modern-bert-v2 Organization: KRLabsOrg Github: https://github.com/KRLabsOrg/verbatim-rag Overview The Verbatim-RAG Extractor is a query-conditioned token classifier that h…

rag
Composition Hallucinations: Not all RAG hallucinations are retrieval failures (zenodo.org via hn) +3 4w

Composition Hallucination in Retrieval-Augmented Generation: A Failure Mode and Benchmark Protocol Description Retrieval-Augmented Generation (RAG) is commonly motivated by the idea that language models answer more faithfully when relevant…

↯ Hallucination hallucination rag
I compared 8 open-source AI agent frameworks so you don't have to — here's the full breakdown (www.reddit.com) +37 4w

We did a deep-dive comparison of the 8 major open-source AI agent frameworks as of mid-2026: 🔹 LangGraph — Best for complex state machines & DAG workflows 🔹 CrewAI — Best for multi-agent role-playing teams 🔹 AutoGen — Now in maintenance mo…

rag openai
How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui (www.reddit.com) +35 4w

As some other fellow lllmers I've discovered few days ago that the amazing llama.cpp project has just added native tools functionalities into the server. After having enabled the relative options into llama-server and played a bit with the…

rag llama
Building Agentic GraphRAG Systems: From knowledge graphs and ontologies to a unified memory as an MCP server for your AI agent. (www.reddit.com) +32 6w

I gave this talk twice in one month: at O’Reilly’s Context Engineering Event and at Abi Aryan’s Maven course on LLM inference at scale. After being blasted with questions, I realized something: GraphRAG isn’t a retrieval algorithm, it’s a…

rag mcp agentic
How are you protecting your AI agents' memory from poisoning attacks? (www.reddit.com) +34 7w

As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…

↯ Security prompt-injection rag security+1
MSA 100M tokens (www.reddit.com) +35 7w

https://arxiv.org/abs/2603.23516 https://github.com/EverMind-AI/MSA If verified, rag is no more needed.

rag
RAG retrieves the refutation and still gets it wrong (reyes.id.au via hn) +3 7w

Anchor catching the failure mode where RAG retrieves the refutation and still gets it wrong Ask vanilla RAG over Duval, Goeckner, Klivans, and Martin's 2015 paper "A non-partitionable Cohen-Macaulay simplicial complex" this question: What…

rag
is multi-agent architecture worth the 15x token cost? (www.reddit.com) +33 7w

moving my current research workflow from a single generalist agent to a multi-agent setup (MAS), and the projected token usage is terrifying. some benchmarks suggest it can be up to 15x more expensive than a standard chat exchange.

rag
Do you guys use AI / Agents for direct profit or do you apply it to be more effective - Could use some guidance and motivation I'm 20 (www.reddit.com) +31 7w

I'm kinda tired of kinda doing rocket Science to have a local agent. Trying to Figure out why its out putting garbage , Then Getting it's output to to stream through my UX layer Properly , Getting it to call tools properly.

rag
Ask HN: Anyone using AI agents for active learning sprints? Here's my setup (news.ycombinator.com) +31 8w

Hi HN, I'm a big fan of AI's ability to provide personalized tutoring. So, lately, I have been using my Antigravity IDE (you can use any agentic harness) for personal learning.

rag mcp agentic
What tools are you using to give your LLM a persistent second brain / long-term memory? (www.reddit.com) +316 8w

I've been going down a rabbit hole trying to solve LLM memory. the problem where every session starts blank and your agent has no idea what it learned last week.

rag mcp
Open-source CLI that turns a folder of docs into a queryable wiki — no vector DB, no chunking (www.reddit.com) +32 8w

Been looking for a self-hostable way to maintain a personal knowledge base from research docs without the complexity of setting up a vector database, writing chunking logic, and babysitting embeddings. Ran into OpenKB this week and it's cl…

vector-database rag
Why many RAG projects are still hallucinating (www.reddit.com) +32 8w

I’ve been auditing quite a few RAG codebases lately, and it’s surprising how often the hallucinations creep in even when the setup looks decent on paper. A lot of the trouble starts with chunking.

rag
Mastermind – agentic SDLC workflow for VS Code (news.ycombinator.com) +3 8w

Prototype of an agentic SDLC workflow running inside VS Code + Copilot. Simple loop: task → reasoning → audit → memory → RAG refresh.

↯ Copilot copilot rag agentic
Which local models are actually good at staying in character? Notes from shipping Qwen3.5 4B + 9B as game NPCs (www.reddit.com) +319 9w

I'm building a small text-based game where the gameplay loop is "talk an NPC into revealing a secret." It's basically a 20+ turn roleplay stress test: the model needs to stay in character, remember what the player said earlier, and refuse…

↯ Tool Use ↯ Qwen 3.5 tool-use rag llama
How are you handling citation/traceability in AI-driven research workflows? (www.reddit.com) +3 9w

been spending ages lately trying to tighten up citation + traceability in RAG-based research workflows, and I’m starting to feel like “retrieval” and “verifiability” are still pretty loosely coupled in most stacks.Typical setup (vector sea…

↯ Tool Use tool-use rag
Project Knowledge indexing never completes on large .md files — permanent spinner, RAG as silent fallback (Max plan, reproducible) (www.reddit.com) +34 9w

I've been using Claude Max for a few months now, and Projects have been central to my workflow. I use two Markdown files in a long-term project that I update regularly — they're essentially living documents that grow over time as I add not…

rag
Building a Production-Grade RAG Chatbot for a Complex Banking Site, Tech Stack Advice Needed? (www.reddit.com) +32 9w

Hey everyone, I’m currently working on turning a fairly large and structured financial website into an AI-powered knowledge assistant (RAG-based). The site itself isn’t trivial, it has multiple product categories (cards, loans, accounts),…

rag
Show HN: 5-translation RAG matrix fixing LLM religious hallucinations (github.com via hn) +3 9w

rag
Show HN: How context engineering works, a runnable reference (github.com via hn) +3 9w

I've been presenting at local meetups about Context Engineering, RAG, Skills, etc.. I even have a vbrownbag coming up on LinkedIn about this topic so I figured I would make a basic example that uses bedrock so I can use it in my talks or v…

rag
I Tried the LLM Wiki and RAG on Todays News from BBC, CNN, Euronews (99helpers.com via hn) +3 9w

Israel-Lebanon Ceasefire Agreement DEEP DIVEIn-depth analysis of the 10-day, US-brokered ceasefire agreement established between Israel and Lebanon. A pivotal 10-day ceasefire agreement between Israel and Lebanon officially went into effec…

rag
Building a fully local Android manual assistant (LiteRT-LM + RAG) what architecture would you use? (www.reddit.com) +31 10w

Processing img 8ofni1q6dpvg1... Hello everyone, I’m building an offline RAG system for my company, we are trying to run an app that retrieves infromation from two manuals in an android tablet with the idea of an AI to provide precise answe…

gemma rag
Zuver – Build your enterprise Agents with just 10MB RAM (news.ycombinator.com) +3 10w

I built Zuver, the generic Agentic AI framework for scalable, reliable, even on-edge AI applications and Agents. It's completely written in Go, which lowers the RAM usage to around 6MB, compared to other Agent framework that's usually arou…

rag agentic
Show HN: Entity Resolution on Your Desktop (tilores.io via hn) +2 23h

Tilores is cloud-based entity resolution software used for fraud-detection, AML, RAG - but being cloud based makes it difficult for people to test as they are reluctant to upload sensitive data. To fix that we built a downloadable version…

rag
LightOn: Production RAG without the 9-month build (lighton.ai via hn) +2 2d

Production RAG without the 9-month build SOTA on public retrieval and OCR benchmarks. Three endpoints, one API key.

rag
I made a free MCP server so your Claude can read Claude/Anthropic news and RAG (claudenews.online via hn) +2 3d

Launch HN: Adam (YC W25) – Open-Source AI CAD Hey HN! I'm Zach from Adam ( https://adam.new/ ).

rag mcp anthropic
Show HN: Save, an API that turns any URL into clean Markdown for LLMs (www.savemarkdown.co via hn) +2 5d

One API call turns any web page into LLM-ready Markdown. Built for AI agents, RAG pipelines and scrapers.

rag
Embeddings as Encodings (hash.dev via hn) +2 5d

Correctly conceptualizing and handling vectorization in knowledge graphs January 26th, 2026 Embeddings are now a default building block in modern data services, powering semantic search, retrieval-augmented generative AI (RAG), clustering,…

rag
Extend Claude limits by offloading AI tasks to Neo (heyneo.com via hn) +21 6d

Install neo-mcp, register NEO with Claude Code, and delegate RAG audits, fine-tunes, evals, and pipeline debugging without leaving the terminal.

rag mcp claude-code
Move over Claude, 99.9% AR, 77.2% Beam – No RAG, No Embeddings, No Tricks (github.com via hn) +2 8d

CEM888.AI Enterprise-grade localized AI infrastructure and sovereign computing environments. CEM888.AI is advancing the future of private, high-performance artificial intelligence systems.

rag
Show HN: Phlox – Open-source self-hosted agentic web chat (github.com via hn) +21 10d

Phlox A feature-rich, ChatGPT-style, self-hostable AI assistant. Phlox is a self-hostable chat application with an agentic harness, document RAG, code execution, and MCP integration — running over any model provider: AWS Bedrock or any Ope…

rag chatgpt mcp+1
From Local to Global: A Graph RAG Approach to Query-Focused Summarization (arxiv.org via hn) +2 13d

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However…

rag
Ask HN: The next evolutionary step in LLM usage? (news.ycombinator.com) +2 2w

I'll keep this post short and sweet, we have seen several steps in the evolution of LLM (large language model) usage. 1.

rag mcp agentic
How to Build an Agentic RAG with RubyLLM and Rails (www.panasiti.me via hn) +2 2w

How to Build an Agentic RAG with RubyLLM and Rails I run a RAG application for Italian pension and tax consultants. Users ask questions about INPS, professional pension funds, laws and regulations, and the app answers using a knowledge bas…

rag agentic
Lessons We Learned Building a RAG Assistant Without a Separate Vector Database (blog.devgenius.io via hn) +2 2w

How we used StarRocks, Gemini, and tool-based retrieval to power grounded Q&A in a developer community Slack. 9 min read 7 hours ago Author:Billy Chang, Software Engineer at Phoenix AI Press enter or click to view image in full size StarRo…

vector-database rag gemini
Show HN: Terraform RAG - index modules, distill conventions, compose via MCP (terraform-rag.io via hn) +2 3w

AI-powered knowledge base for your Terraform modules. Index, search, compose, and audit - all from one place.

rag mcp
ContextWall – Context firewall for AI agents and RAG pipelines (contextwall.io via hn) +2 3w

Your AI agent reads untrusted content. Every web result, document, and API response your agent retrieves goes straight into the model's context window - unscreened.

rag
Show HN: ContextBridge – Local-first AI reading sidebar using Ollama (chromewebstore.google.com via hn) +2 3w

Overview Store, search, and chat with web page content locally. AI chat (BYOK), full-text search, markdown export, and optional RAG endpoint.

ollama rag
Stop AI agents from being weaponized through their own memory (OWASP) (www.helpnetsecurity.com via hn) +2 3w

OWASP Agent Memory Guard: Stop AI agents from being weaponized through their own memory AI agents keep memory across sessions. Conversation history, vector stores, scratchpads, and RAG indexes persist between runs, and anything written int…

rag
I built an enforcement layer for AI coding agents using a local knowledge graph and hybrid RAG (www.reddit.com) +21 4w

I know this sub is focused on local models but the architecture behind this applies to any LLM-powered coding agent, not just Claude Code. The problem: when you give a coding agent a large set of rules and standards, two things break.

rag claude-code
The Self-Healing Vector Database (www.reddit.com) +24 4w

A pattern I keep seeing in agentic RAG systems: The agent is smarter than the retrieval layer. It can notice that context is stale.

vector-database rag agentic
Show HN: Search Router – retrieval-ready web search for AI agents (github.com via hn) +2 4w

Search Router is a web search API built for AI agents and RAG systems. We built it internally at first, when working on AI tools.

rag
The only way to avoid prompt injection is to never give AI agents API keys, credentials, etc. (www.reddit.com) +210 4w

The whole point of AI Agents is that they can *do* things. For this, they use API keys, GitHub tokens, database passwords, OAuth tokens, etc.

↯ Security prompt-injection rag security
Where does your agent memory live? (www.reddit.com) +23 4w

How do you decide where context persists across sessions? markdown or SQLite file on local filesystem relational DB like Postgres document based db Mongo vector DB with a RAG pipeline Assuming you're not using a 3rd party memory layer like…

rag
Tool-schema compression enables agentic RAG under constrained context budgets (arxiv.org via hn) +21 4w

Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the first syst…

rag agentic
Are local LLM users testing prompt injection before connecting models to tools? (www.reddit.com) +214 4w

I wanna know how people here are handling security once local models move beyond chat.....Running a model locally feels safer because the data does not leave your machine or your infra. That is a real advantage.....But once the local model…

↯ Security prompt-injection rag security
Maybe the problem with non-coding agents is that they have no repo (www.reddit.com) +24 4w

I’ve been trying to understand why coding agents seem to work better than most non-coding agents. Maybe the thing coding agents have that most other agents don’t is the repo itself.

rag
numind/NuExtract3 · Hugging Face (huggingface.co via reddit) +2 4w

NuExtract3 is a unified 4B vision-language reasoning model for document understanding. It combines strong structured information extraction with high-quality image-to-Markdown conversion, making it suitable for extraction pipelines, OCR, a…

vllm rag
Is there any reason for an uncensored model if you have no interest in roleplaying? (www.reddit.com) +28 4w

My rag I've been building is much in response to having a LLM that I feel more confident in knowing where the knowledge base is coming from especially after the Open AI deal with the Pentagon. So, when I saw "uncensored" heretic models, I…

↯ Qwen 3.6 rag
Agent builders: are GPT/Claude/Gemini API costs killing your margins? (www.reddit.com) +24 5w

Hey everyone, For people building agents with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude MCP/SDK, Google ADK, or LlamaIndex — how are you managing LLM API costs? Agent workflows can get expensive fast because of: tool calls retr…

deepseek rag qwen+5
PDF and non-text local file reading with AnythingLLM? (www.reddit.com) +22 5w

So far, AnythingLLM works well for me when i copy files over to docker folder (so originals can't be erased/modified), and i have LLM do a text search. RAG I tested but with number of files and specificity, just searching for file names an…

rag
Embedding models are coordinate systems. What silently breaks in production RAG (internals.laxmena.com via hn) +2 5w

Your embedding model doesn’t understand your data INTERNALS.md #3 · It never did. Here’s what it actually does, and why that matters for every RAG system you’ll ever build.

rag
Context is shared. Commitment is not. (www.reddit.com) +23 5w

Everyone is talking about context management. RAG pipelines, memory systems, knowledge graphs, long-context windows.

rag
Show HN: Nano-RAG – Agentic multi-hog retrieval without graph database (news.ycombinator.com) +2 5w

https://nanorag.nb1t.sh/ Important: Please choose correct namespace from top-right dropdown. Available docs/namespaces: Cloudflare, Nextjs, and Dodo-payments (default).

rag agentic
RAG vs. Fine-Tuning – The Question Every AI Builder Gets Wrong (thingswithai.org via hn) +2 5w

RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong AI models don't know your private data.

↯ Fine Tuning fine-tuning rag
Are AI agents creating a new runtime supply-chain attack surface? (www.reddit.com) +27 5w

I’ve been thinking about AI agent security less as a prompt-injection-only problem and more as a runtime supply-chain problem. In many deployed agents, the model is no longer just generating text.

rag mcp
Agent memory is not just RAG over user facts (www.reddit.com) +25 5w

I keep seeing agent memory implemented as: Extract facts/preferences from conversation Store them Retrieve top-k before each response Inject them into the prompt This works for demos, but it breaks in production because memory becomes poli…

↯ Security prompt-injection rag security
What do you charge for production-ready invoice/document automation? Sanity check on a €20k quote (www.reddit.com) +29 6w

I am currently looking to get into automation for German Mittelstand and I am now talking to an SME, which got an offer from a consulting firm for document processing automations and trying to figure out if the pricing is normal or inflate…

rag
RAG Eval Comparing Vertex/Bedrock/Azure/OpenAI (github.com via hn) +21 6w

RetrievalCI Stage: bench-v0 early preview. The methodology, scorecard format, and 9 system adapters are stable.

rag openai
I just have a question about Langchain and Langgraph (www.reddit.com) +23 6w

I want to know that learning these fundamentals is enough to land job or is there something else that i have to learn along with these? Right now i am learning about genAI through campusX and making rag projects.

rag
Token, Harness, OpenClaw, RAG, MCP, Agent – What's the Difference? (medium.com via hn) +21 6w

11 min read Apr 23, 2026 You know these terms alone. Together?

openclaw rag mcp
Argus – RAG based vulnerability scanner (github.com via hn) +2 6w

argus A RAG-based (Retrieval-Augmented Generation) vulnerability scanner for Go, Python, Rust, npm/Node.js, Maven/Java, NuGet/.NET, and Ruby projects — powered by local Ollama models or any OpenAI-compatible API. No cloud lock-in.

↯ Security ollama rag security+1
Some notes and lessons on Agents, RAG and memory (www.reddit.com) +23 6w

I put together some notes on building agents. I have built agents at scale for a while now and for a few clients, so I thought i would start putting all the knowledge into lessons that might help other people as well.

rag
On "harness engineering": Are people actually building things or just giving impressive labels to "tweaking?" (www.reddit.com) +2 6w

I see a lot of posts and videos talking about harness engineering, or it could be context engineering, RAG, etc. The thing is, most of them talk about the concepts.

rag codex openai+1
Open Sourcing Our Platform - GuideAnts Notebooks (www.reddit.com) +22 6w

This is yet another agent harness and UI and I hope you will have a look and consider contributing. Elumenotion/GuideAnts: GuideAnts Notebooks.

rag agentic
We built an agentic AI for support triage. 47% deflection in 90 days. Full retro. (www.reddit.com) +23 7w

Setup: mid-size SaaS, ~3,000 tickets/month, 6 agents drowning. 70% of volume was tier-1 (passwords, billing, where's-my-feature).

rag sonnet agentic
Claude architecture mock test.. (www.reddit.com) +21 7w

Built a new update for Claude Playground 🚀 Added Mock Tests for learners preparing for the Claude Architecture exam — users can now validate their understanding and test their learning directly on the platform. The goal of Claude Playgroun…

rag anthropic
Project knowledge file indexing reliability seems to be getting worse? (should I just use cowork instead?) (www.reddit.com) +22 7w

I haven't used Cowork yet - Would it solve my troubles with Project Knowledge files not indexing consistently? I see Projects can now be imported to Cowork, then I'd have my knowledge files hosted on my hard drive?

↯ Cowork cowork rag
Ask HN: Are you optimizing content for AI Search (GEO) vs. traditional (news.ycombinator.com) +2 7w

With the rise of SearchGPT, Perplexity, and Gemini, the goal of content is shifting from "ranking on page 1" to "being cited in the answer block." I’ve been working on a tool (https://aibg-intelliagent.com/) that uses a private RAG (Retrie…

rag gemini
RAG vs. Fine-Tuning: Which AI Strategy Saves Your Team Time and Budget (lightrains.com via hn) +2 7w

Two weeks before a Fortune 500 product launch, we told a client to scrap their fine-tuned model and rebuild with RAG instead. They lost eight weeks and $180K.

↯ Fine Tuning fine-tuning rag
Egg meet face. (www.reddit.com) +22 7w

https://preview.redd.it/drtw1mjwf7zg1.png?width=997&format=png&auto=webp&s=90b45173c1caba12a10bd4ff4a0a717563be9512 https://preview.redd.it/kk1ayljwf7zg1.png?width=997&format=png&auto=webp&s=f0b210cef867d817891635138f9a531b7e2e2fcc https:/…

rag
NodeMind – binary document index, 48× smaller than float32 RAG, no GPU required (github.com via hn) +2 7w

NodeMind — Binary Document Intelligence 48× smaller online · 32× smaller offline · up to 100× on images. 75× faster search.

rag
How are you feeding documentation into agents/RAG without HTML noise? (www.reddit.com) +22 7w

I’m testing a workflow where docs sites get converted into: concise llms.txt index full Markdown bundle cleaned page chunks manifest JSON For people building agents or local RAG systems: do you prefer one giant Markdown file, per-page Mark…

rag
Built a free migration wizard for moving ChatGPT history into Claude Projects — learned a few things about how Projects actually work (www.reddit.com) +21 8w

Been using Claude for a few months and hit the same wall everyone hits: years of context stuck in ChatGPT with no real path to bring it over. Claude's built-in memory import is surface-level — name, preferences, tone.

rag chatgpt
I built an AI that tries to answer life’s hardest questions using the Bhagavad Gita. (www.reddit.com) +22 8w

I built an AI that tries to answer life’s hardest questions using the Bhagavad Gita. Over the last few weeks, I’ve been building GitaGPT Mentor It’s not just another chatbot.

rag
W2A: an open protocol for agent sensors — giving local agents real-time perception (www.reddit.com) +23 8w

Sharing a project that just went public: World2Agent (W2A) — an open protocol for the perception side of the loop. Entirely self-hostable, no SaaS, no telemetry, TS SDK, Apache 2.0.

rag mcp
How should AI agents handle continuity across long-running conversations? (www.reddit.com) +23 8w

Hi everyone, I’ve been working on a continuity layer for OpenClaw agents, and I’d like to get feedback from people building or running AI agents. The problem I’m trying to solve is that many agents can respond well within a single turn, bu…

openclaw rag
Poisoning RAG document corpora: 32 vectors tested, 19 succeeded (corrupted.io via hn) +2 8w

RAG Poisoning: When Your “Safe” AI Eats Bad Documents So you built a RAG pipeline. Congrats.

rag
FerresDB is now open-source – A high-performance vector database (github.com via hn) +2 8w

FerresDB Core High-performance vector search engine written in Rust, designed for semantic search, RAG (Retrieval-Augmented Generation) and recommendation systems. Overview FerresDB Core is a Rust vector search engine for semantic search,…

vector-database rag
Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab (www.agentmcp.studio via hn) +2 8w

I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML fi…

rag mcp
RAG isn’t for conversation transcripts (www.reddit.com) +27 8w

Documents are authored, bounded, and self-contained. They carry their own semantic links and can be represented as a wiki or cleanly split into overlapping chunks.

vector-database rag
How Claude Projects actually loads files into context? Want to optimize token burn; can't get a straight answer (www.reddit.com) +211 8w

I've built a fairly involved system inside a Claude Project: project instructions plus 10 project files that function as a routing system. Trigger words in the instructions point Claude to specific files (instructions, templates, reference…

rag anthropic
Feedback on VectorLess RAG? (www.reddit.com) +22 9w

From an year working in space of developing based pipeline and applications. Have worked enough building data on vector db + chunking + embedding etc., now there is an new trend of using vectorless RAG.

rag
How do you decide on chunking strategy and top-k in Agentic RAG? Looking for practical advice (www.reddit.com) +21 9w

Hey, I'm building an Agentic RAG pipeline and struggling with two decisions: Chunking strategy — fixed-size, semantic, or hierarchical? In an agentic setting where the agent can re-query iteratively, does it make more sense to use smaller…

rag agentic
Looking for FREE resources to master RAG + LLM Agents + MCP (and build real projects for freelancing/jobs) (www.reddit.com) +26 9w

↯ Model Context Protocol model-context-protocol rag mcp
RAG as Similarity Engine (necromant2005.github.io via hn) +23 9w

rag
Is anyone else using Cursor to build local VRAM/RAG architectures instead of just wrapper apps? Here is my 8-month deep dive. (www.reddit.com) +228 9w

vector-database rag cursor
I'm completely lost in the Agentic Maze. What level to learn. how to organize stydu (www.reddit.com) +212 9w

↯ Opus 4.7 vector-database rag gemini+2
Stop using naive RAG – adding relationships to AI context (news.ycombinator.com) +2 9w

I’ve been working a lot with RAG systems recently, and kept running into the same issue: they retrieve relevant chunks, but lose the relationships between them. This becomes a problem pretty quickly when dealing with real systems (docs, AP…

rag cursor mcp+1
Show HN: AI agents should browse your site, not call your API (www.rtrvr.ai via hn) +2 9w

We compared four architectures for putting AI agents on websites — RAG bots, API-tool agents(WebMCP), code-writing sandboxes (Cloudflare Agent Lee), and DOM-native execution. Three of them force you to maintain a parallel engineering surfa…

rag
TF-IDF over code signatures hits 80% hit@5 retrieval — no vectors, no embeddings. Tested on 18 repos. (www.reddit.com) +22 10w

Been experimenting with context compression for local models. Wanted to test how far pure heuristic retrieval can go before you actually need vectors.

rag
I built an MCP server that turns Claude into an emergency medicine assistant — what I learned building AI for high-stakes domains (www.reddit.com) +21 10w

If you work in healthcare or just want to see how Claude handles high-stakes clinical reasoning — I built an MCP server for this and wanted to share what made it harder than a typical AI project. EMSy is built on top of Claude and connects…

rag mcp
Open source research agent with RAG, streaming, and web search - one file backend (www.reddit.com) +2 10w

Built two open source agents: 1. Research agent - searches the web, streams answers with sources (like Perplexity) 2.

rag
It's tax time... agent-built RAG app end-to-end with Claude Code + an SDK skill (www.reddit.com) +26 10w

It's tax time, so I whipped up a tax doc assistant with our new Ragie skill. Concrete example of agent-assisted development that goes further than toy demos.

rag claude-code
Cursor AI not using sub-agents (www.reddit.com) +23 10w

Hi everyone, I work for a German agency building a RAG chatbot for a law firm. I use Opus 4.6 but it eats up tokens.

↯ Claude 4.6 rag sonnet cursor+1
Show HN: NRC nuclear licensing RAG pipeline and regulatory embeddings dataset (huggingface.co via hn) +2 10w

I've been building an AI system to automate parts of the NRC Combined Operational License process: gap analysis against the Standard Review Plan, FSAR strength scoring, and RAI prediction using vector similarity to historical NRC requests.…

rag openai
Memelang: Terse SQL for LLM Generation (memelang.net via hn) +2 10w

Memelang is an AI-optimized query language that significantly reduces token count and model size for LLM RAG. The code below is designed to be copy-and-pasted into your LLM.

rag
Show HN: BitVanes – A zero-trust RAG pipeline engine in Rust, WASM, and Arrow (www.bitvanes.com via hn) +1 2d

Most RAG pipelines ship raw, sensitive documents over the wire to cloud services just to get them parsed, scrubbed of PII, chunked, and vectorized. BitVanes is a zero-trust, local-first ETL engine designed to solve this.

rag
Show HN: Open-Source RAG Security Kit for Zero-Trust Retrieval (blog.aetherguard.ai via hn) +1 9d

Vercel Security Checkpoint | sfo1::1781830903-OPQW0grqUdIgsx9p96Wslv3l1RK4f3l0

rag
Ucp-Local – Offline RAG for Claude Desktop, Cursor, and LM Studio (github.com via hn) +1 9d

UCP — Universal Context Pipeline A local-first MCP server that grounds LLMs in your own files. UCP indexes folders on your machine — notes, code, conversation exports — and exposes them to any MCP-compatible client (Claude Desktop, Cursor,…

rag cursor mcp
Bayer's PRINCE: a production agentic RAG system (martinfowler.com via hn) +1 9d

Building Reliable Agentic AI Systems A Case Study in building production-ready agentic AI systems This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted platform developed by Bayer AG with Thoughtworks to address p…

rag agentic
Show HN: ArXiv Scholar – An Open-Source RAG System for AI Research Papers (github.com via hn) +1 9d

Try Search: https://ethereal-agents.space/search.html Technical Blog: https://ethereal-agents.space/blog/launching-arxiv-scholar.h... We'd love feedback on the retrieval quality, user experience, and overall approach.

rag
I indexed 936 Lex Fridman episodes into a RAG that cites its sources (github.com via hn) +1 11d

🎙️ OmniPod Chat with 936 podcast episodes. Every answer cites its source.

rag
Ask HN: What will be the next big memory management system for AI Agents? (news.ycombinator.com) +11 13d

We have all seen RAG and Graph Knowledge, but in your opinion or if you know of some cool project, what’s the next innovation that could a hierve true perpetual memory and true personalization???

rag
Show HN: Kickoff the World Cup with 49k match results from 1872 to 2026 (github.com via hn) +1 2w

Free soccer RAG MCP Server. Connect it with Claude or your favorite agent and kickstart your soccer research.

rag mcp
MarkSentry – zero-trust document-to-Markdown for RAG pipelines (sunilgentyala.github.io via hn) +1 2w

Path traversal jailing, SSRF blocking, VBA macro stripping, zip-bomb detection, multi-column PDF, and PII redaction. Everything MarkItDown skips.

rag
Show HN: RAG built for Frappe using TurboVec (github.com via hn) +1 2w

Turbo Rag Turbo-fast RAG for Frappe (v14) using TurboVec License MIT

rag
Replacing RAG with a cognitive memory stack in Elixir/OTP (0xcc.re via hn) +1 2w

Skynet: Towards Synthetic Neurobiology The original idea was a joke. I was looking at LLM loops and thinking about how they map onto Elixir’s actor model — GenServers that receive messages, process them, maybe spawn new processes.

rag
Prompt Injection in RAG Agentic Systems (ulad.net via hn) +1 2w

Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.

↯ Security prompt-injection rag security+1
Show HN: Incremental RAG ingestion, only changed chunks get re-embedded (github.com via hn) +1 2w

chunks-sync Incremental synchronization for RAG pipelines. Most RAG ingestion pipelines re-embed every document whenever a file changes, even if only one paragraph was edited.

rag
MemGraphRAG: Memory-Based Multi-Agent System for Graph RAG (arxiv.org via hn) +1 2w

Retrieval-Augmented Generation (RAG) has become an essential method for mitigating hallucinations in Large Language Models (LLMs) by leveraging external knowledge. Although effective for simple queries, traditional RAG struggles with large…

rag
Show HN: Ext-Infer – Native LLM Inference and Embeddings for PHP (infer.displace.tech via hn) +1 2w

Introduction ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs LLM inference inside the PHP process via llama.cpp. PHP-native semantic search, RAG pipelines, and CLI / worker inference run without shelling out to Python or…

rag llama
Tool to convert technical PDFs into RAG-ready chunks and Obsidian vaults (pdf-knowledge-extractor.onrender.com via hn) +1 2w

Sign In / Create Account Enter your API key to sign in New user? Create a free account with 5 extractions Account created!

rag
RAG Without Persona Modeling Fails Patient Clinical Relevance (www.riddhimohan.com via hn) +1 3w

HPPIE fuses persona modeling into the RAG pipeline to deliver patient-specific health content. 2nd of 300+ at a Global AI Hackathon.

rag
Show HN: Digger Solo – Local AI File Explorer (solo.digger.lol via hn) +1 3w

After a lot of work I present Digger Solo 0.5.0 - the AI file explorer that respects your privacy (everything runs locally). Demo video: https://vimeo.com/1198414414 New features: - LLM Chat with RAG (bring your own OpenAI compatible API k…

rag openai
Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG (www.infoq.com via hn) +1 3w

In this article, author Aaditya Chauhan discusses the limitations of RAG pipelines based purely on vector search and how an internal omni-search application using Reciprocal Rank Fusion (RRF) that combines BM25 and vector results, can enha…

rag
Show HN: Extract (YC P25) – Fast, accurate document parsing (extract.page via hn) +1 3w

Hey HN, we’re Soami, David, and Achyut, co-founders of Extract. Extract parses documents into structured data (text, tables, and figures).

rag
How We Index Images for RAG (www.kapa.ai via hn) +1 3w

How we index images for RAG Reading the screenshots, diagrams and tables in technical documentation for LLMs by Matteo Bortoletto Kapa builds AI assistants that answer questions from technical documentation. The knowledge bases we process…

rag
Open-source NLI ensemble matches Sonnet 4.6 on RAGTruth at 1/250x the cost (github.com via hn) +1 3w

verifiable-rag Document-grounded Q&A with sentence-level citations, NLI verification, and calibrated refusal. Status: pre-alpha · v0.5 launch sprint · interfaces are still subject to change 📚 Full documentation at firish.github.io/rag-rack…

rag sonnet
Running local RAG AI on MacBook neos (securethink.co.uk via hn) +1 3w

AI-powered document analysis that runs 100% locally on your Mac. Analyse contracts, engineering specs, and sensitive data without the cloud.

rag
Authorization Before Retrieval: Making RAG Safe by Construction (www.windley.com via hn) +1 3w

Retrieval-augmented generation makes language models far more useful by grounding them in real data, But it also raises a hard question: who is allowed to see what? This post shows how authorization can be enforced before retrieval, ensuri…

rag
VDF AI – Multi-agent AI orchestration with dynamic model routing (vdf.ai via hn) +1 4w

VDF.AI is the on-premise AI agent platform for enterprises that need governed multi-agent orchestration, private RAG, LLM routing, and full data sovereignty — without the cost or lock-in of cloud AI.

rag
How to Stress-Test LLM Judges Fairly (www.alphaxiv.org via hn) +1 4w

We're hiring Paper Blog Audio 4 / - Hide Tools Ctrl + / Open Tools A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test Assistant My Notes Comments Similar

rag
knowledge graph for maintaining git worktrees and shared findings across projects (www.reddit.com) +11 4w

sometimes when i scroll social media i see stuff about knowledge graphs. it crossed my mind that I do something similar.

rag
Turn any GitHub repository into an interactive code graph in seconds and use it as an MCP with your AI Assistants (www.reddit.com) +1 4w

Change https://github.com/owner/repo → https://cgc.codes/owner/repo A standard GitHub URL can be instantly transformed into a CodeGraphContext (CGC) graph URL, unlocking architecture visualization, code navigation, dependency exploration,…

↯ Copilot copilot rag gemini+2
Gnani AI - AI Prompt Engineer role (www.reddit.com) +11 4w

Anyone here working at Gnani AI or knows someone there? I got an offer for the AI Prompt Engineer role and wanted to know how the work culture is.

rag
Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? (www.reddit.com) +114 4w

People are warning me about the prompt-processing speed of a MacBook Pro M5 Max with 128 GB RAM. My main concern is prompt ingestion / prefill latency and large-context handling — not raw token generation speed (which I think is OK).

↯ Qwen 3.5 ↯ Qwen 3.5 moe rag qwen+2
Tlamatini – Local-first AI dev assistant with 68 agents and hybrid RAG (github.com via hn) +1 4w

Tlamatini A local-first AI developer assistant that goes beyond chat. Run it on your machine with Ollama.

ollama rag
Why Does Everyone Think AI Agents Are Easy? (www.reddit.com) +17 4w

Lately it feels like every problem gets the same answer: “Just build an AI agent.” I had lunch recently with people outside tech, and someone mentioned spending hours replying to customer chats at work. Immediately another person said: “Wh…

rag
Is grep all you need? Lexical VS Sematic Search for Agents (www.llamaindex.ai via hn) +1 4w

Lexical search with grep is fast and precise, but it breaks down at enterprise scale. Learn when to use grep, semantic search, or a hybrid RAG approach to build AI agents that can search any corpus, in any format, at any size.

rag
AI for internal IT support/password resets in mid-size & enterprise companies- is anyone actually seeing good adoption? (www.reddit.com) +11 4w

Anyone here from a mid-size or enterprise company using AI for internal IT support workflows like password resets, account unlocks, MFA resets, software access requests, etc.? We’re exploring AI-driven employee support internally and I’m c…

↯ Copilot copilot rag
LMIM OS – an offline AI ecosystem. Voice, RAG, WhatsApp. ++ One file. 0 setup (lmim.tech via hn) +1 4w

19+ tools — no cloud, no API key, no subscription. All in one AppImage / Installer.

rag
Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) (news.ycombinator.com) +1 4w

About me: I am an AI Product Engineer specializing in building autonomous agentic workflows. Recently, I built 'Jarvis', a multimodal autonomous agent featuring near-zero latency inference using Groq SDK and complex RAG pipelines.

rag agentic
Databricks project ideas as a Data Engineer looking to transition roles (www.reddit.com) +11 4w

Hey, I'm a data engineer looking to transition into AI engineering. I'm looking to learn and build a resume with some projects.

rag
Why codex /goal fails on complex workflows: compaction amnesia and context rot (news.ycombinator.com) +1 4w

Hi HN, When Openai released `/goal` earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex…

rag codex openai
Astrum Verum – A Vector Symbolic cognitive memory that beats RAG (github.com via hn) +1 4w

Astrum Verum Composition-episodic cognitive memory for AI agents — and an honest record of how it got here. Astrum Verum is a research project containing two distinct phases of memory architecture development.

rag
Every RAG-based localization pipeline has the same blind spot (lingo.dev via hn) +1 4w

If a localization pipeline uses retrieval augmented generation to inject glossary terms into the model's context window, it has a retrieval recall problem that has never been measured. The pattern is universal: embed the input text, cosine…

retrieval-augmented rag
RAG for developer docs so local llm can code using latest library? (www.reddit.com) +15 4w

I was wondering if it would make local llm better at coding if it has access to the latest documentation available through a RAG. I'm specifically interested in python.

rag
Trying to work around AI and its constraint at my workplace (www.reddit.com) +11 4w

I would rate my AI skills between beginner and intermediate. I know how to use tools like ChatGPT and GitHub Copilot to build a chatbot with a system prompt.

↯ Copilot copilot rag gemini+1
Built a production RAG chatbot with custom MCP servers as the action layer, sharing what I learned (www.reddit.com) +13 4w

I've been building agentic tooling at work and wanted to share one pattern that worked. Instead of a chatbot that only retrieves and answers, I wired custom MCP servers in as the action layer, so staff trigger live workflows (create record…

rag mcp agentic
Ask HN: Why agentic development stops from 2023 (news.ycombinator.com) +1 4w

I leave this field in 2023 return back in 2026 and I see that only progressive development in coding agents, but some production solutions it’s just tools rag and maybe mcp that in general the same as tool. I thought it will be super leap…

rag mcp agentic
The shared recipe behind search: Images, Shazam and RAG (medium.com via hn) +1 4w

medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

rag
Enterprise AI why soo cumbersome (www.reddit.com) +16 4w

Just started in a new bigger company. Suppose to accelerate the adoption of AI.

rag
"Most RAG benchmarks lie about real-world corpora." Test data from 3 production websites. (www.reddit.com) +11 4w

Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: Workspace Sources Chunks HIGH MEDIUM LOW REJECTED Intercom 188 941 96 200 541 104 HubSpot 251 1705 40 508 1153 4 KPMG 53 209 3 14 127 65 (…

rag
ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster (www.reddit.com) +11 5w

I built ztok, a tokenizer library focused on being fast and format-agnostic for local pipelines. - Loads what you already have — .tiktoken, HF tokenizer.json, SentencePiece .model, TokenMonster, Mistral Tekken.

↯ Mistral mistral rag
Gemini filesearch scalability (www.reddit.com) +11 5w

I'm about to introduce gemini filesearch to my company to handle all the RAG related operations but not just internally, I'm fixing the projects VS stores logic to be able to scale this up to thousands of small clients. Has anyone used gem…

rag gemini
Most agent RAG problems I see are retrieval problems, not model problems (www.reddit.com) +13 5w

I've spent the past year building a site-search product and watched maybe 50 teams plug their docs into a vector DB, expect magic, and end up debugging why the LLM is lying. Its almost never the LLM.

rag
Open catalog of agent patterns + the frameworks that implement them (www.reddit.com) +15 5w

I have been building an open catalog of agent patterns and the frameworks that implement them. It is a pattern language in the Christopher Alexander sense, mapped onto the current agent landscape.

rag qwen openai
My agent kept forgetting who 'Karpathy' was between sessions. Here's the architecture that fixed it (www.reddit.com) +12 5w

I run a second brain on Obsidian, Readwise, NotebookLM, and Claude Code. For each topic, I build a scoped wiki structured as the LLM Knowledge Base Andrej Karpathy proposed.

rag gemini codex+3
AI agents are making tokenization platforms far more usable than I expected (www.reddit.com) +15 5w

Been working on AI-assisted workflows for tokenization platforms recently, and I’m honestly surprised by how useful agents are becoming in complex financial processes. Some areas where they’ve helped a lot: onboarding automation document u…

rag
Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph) (www.reddit.com) +11 5w

Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Co…

↯ Windsurf windsurf ollama deepseek+4
Built a Fetch API that returns page labels, not just markdown (www.reddit.com) +12 5w

I'm working on a Fetch API for RAG, agents, and web ingestion workflows. Think Firecrawl/Jina Reader-style URL-to-markdown or clean-text API, but with one extra signal layer: page labels for content category and page structure.

rag
We engineered RAG to be 50% faster (elevenlabs.io via hn) +1 5w

How we engineered RAG to be 50% faster - Written by - Michal Korbela - Published - Last updated ListenListen to this article RAG improves accuracy for AI agents by grounding LLM responses in large knowledge bases. Rather than sending the e…

rag
Booking.com and Weaviate (news.ycombinator.com) +1 5w

Vector search looks easy, until you hit production scale. I'm super excited to share a new episode of the Weaviate Podcast with Başak from @bookingcom on production-scale vector search, RAG, and agentic AI with @weaviate_io!

rag agentic
What Matters in Production RAG (arpitbhayani.me via hn) +1 5w

Most of us build RAG the same way: follow a tutorial that embeds a handful of PDFs, stores the vectors in a local Chroma instance, and chains everything together with LangChain (if that’s still a thing). The demo works.

rag
Project Prism |Fullstack Engineer – Abu Dhabi (Onsite) – Full-Time – Presight.ai (news.ycombinator.com) +1 5w

Presight.ai is a publicly listed company with various projects in the field of big data analysis and ML models application. Our solutions work domestically and internationally.

rag agentic
Built an agentic RAG over my Obsidian vault so Claude could read engineering books I never have time for. Then I built the eval harness to check Claude wasn't lying to me. (www.reddit.com) +11 5w

For context, I posted on Medium a while back about burning through Claude Code's weekly limit in 3 days. The token bleed problem from that post is what kicked off this project.

rag agentic claude-code
We compiled 42 of the Generative & Agentic AI interview questions (and how to actually answer them). (www.reddit.com) +15 5w

Hey Everyone, The AI engineering job market has shifted massively in the last 6 months. Interviewers are no longer just asking "how does a transformer work?" or "how do you write a good prompt?" They want to know if you can architect produ…

rag agentic
Are we all quietly rebuilding memory systems because current AI memory doesn’t actually work long-term? (www.reddit.com) +12 6w

The more I work with long-running agents, the more it feels like most “AI memory” today is just retrieval with nicer branding. Everything works in demos: vector DBs RAG summaries context packing knowledge graphs But after enough real usage…

rag
Show HN: RAG-LCC – config-driven RAG framework for fast experimentation (github.com via hn) +1 6w

🧪 RAG‑LCC — Experimental RAG Under Constraints RAG‑LCC is an experimental Retrieval‑Augmented Generation (RAG) lab focused on understanding and controlling retrieval and context assembly under real‑world constraints: limited context window…

rag
What is the most unexpected thing you have gotten a local model to do? (www.reddit.com) +1 6w

Most local LLM use cases I see are chat, coding, and RAG. But with vision models getting better and faster on consumer hardware, I feel like there is a lot of untapped territory.

rag
Agents Can Reason. They Still Can't Search (dipkumar.dev via hn) +1 6w

Agents have a search problem across the whole stack: web search, RAG, tool discovery, skills/workflow loading, and even context compaction.

rag
Tried 12+ agentic AI workflow builders this year — these 5 actually work in production (www.reddit.com) +12 6w

Most “AI agent” tools in 2026 still feel like glorified chatbot wrappers. I spent the last few months testing different agentic AI workflow builders for real-world automation use cases (multi-agent workflows, approvals, integrations, long-…

↯ Copilot copilot rag agentic
Most RAG apps in production are confidently wrong and nobody talks about this enough (www.reddit.com) +12 6w

Been working with a few teams integrating RAG into internal tools, support bots, document Q&A, contract search, and I keep running into the same thing nobody warns you about when you're following tutorials. The basic retrieve-then-generate…

rag
There's a meaningful difference between a knowledge base your LLM searches and one it can navigate. Has anyone shipped something in the second category? (www.reddit.com) +13 6w

RAG gives you search over a corpus. Useful.

rag
Local-first LLM context dedup: 22-71% chunk overlap measured across 22M passages (2 arXiv papers). MCP server, MIT, 250KB binary, zero telemetry. (www.reddit.com) +12 6w

I'm the author of this thing, disclosure up front. Been hanging around this sub lately on cache invalidation, MoE memory tradeoffs, long-session token bloat.

moe rag mcp
22-71% of your AI coding input tokens are duplicates, we measured it across 22M passages (2 arXiv papers). Just shipped MCP support for Cursor (www.reddit.com) +1 6w

Disclosure first: I'm the author. MIT, runs locally, zero telemetry.

rag cursor mcp+1
Microsoft patched 137 bugs, but the Azure AI Foundry one is what caught my eye (www.reddit.com) +11 6w

Microsoft just patched 137 vulnerabilities across Azure, Windows, Dynamics 365, Copilot, Office, and other products. Most of it looks like the usual Patch Tuesday flood, but one detail stood out: Azure AI Foundry is listed among the high-s…

↯ Copilot copilot rag
Arkon: turning Claude from a personal chatbot into a managed organizational resource (www.reddit.com) +12 6w

Sharing a project I've been building. Not asking for anything in particular - just thought the problem and approach might be interesting to some folks here.

rag chatgpt
New guy with an RPG agent Project (www.reddit.com) +11 6w

Hi' I'm a long time tabletop game master and a rather neophyte programmer(college diploma in programming for video games, no real work experience yet). I have done a 4 hours AWS workshop on building RAG agents during my intership with a st…

rag cursor chatgpt+1
How I buld agents (www.reddit.com) +11 6w

Everyday I see tons of AI generated posts about tricks to build AI agents. Here is one written by a human with experience and typos :) Step 1: Never directly compete with a human Is there a specific job title for sth?

rag
A 3.5 MB C++ engine for deterministic RAG deduplication hitting 30 GB/s (github.com via hn) +1 6w

Merlin Community Local-first dedup for LLM context. Lite engine, MIT integrations, papers on arXiv.

rag
Stop struggling with Agentic AI - my repo just hit 540+ stars and 60+ forks!! (www.reddit.com) +12 6w

Quick update — my AI Agent Frameworks repo just passed 540+ stars and 60+ forks on GitHub!! When I first put it together, my goal was simple: make experimenting with Agentic AI more practical and approachable.

rag mcp agentic
A Bette RAG Alternative (www.codynamicslab.com via hn) +11 6w

$ docker run --gpus all -p 8091:8091 codynamics/latch:latest [latch] runtime starting on http://0.0.0.0:8091 [latch] status=loading profile=cdlac_latch_qwen14b_locked_20260317 [latch] warmup complete status=ready $ curl -s http://127.0.0.1…

rag
We added an enforcement layer to our AI agents in production — here's what we learned about the failure modes nobody talks about (www.reddit.com) +16 6w

After shipping AI agents into real production environments, the failures that actually kept us up at night weren't hallucinations or bad outputs — they were control failures. Three things that surprised us: 1.

↯ Security prompt-injection rag security
FlowFlow, voice notes with on-device RAG in Rust for iOS (github.com via hn) +1 6w

FlowFlow Mobile voice notes app with AI chat — 100% Rust, Dioxus iOS, local-first (SQLite + LanceDB). Built with Dioxus 0.7 for iOS.

rag
I just launched my first open-source project and I want to learn how to become a better developer/maintainer. A remote vibe coding tool. (www.reddit.com) +12 6w

Hey everyone, I’ve been a developer for a while, but I’ve always been a "lurker" when it comes to open source. Recently, I finally pushed my first project to GitHub: Legax.

rag
Integrating standard operation procedures with agentic AI workflow (www.reddit.com) +12 6w

Hello guys, me and my team have been building an agentic workflow to answer customer questions (rn in langgraph). The use case goal is to answer ALL customer support questions.

rag agentic
Here is the current "Free-Tier AI Stack" for 2026 (www.reddit.com) +11 6w

1. The Frontier Giants • Gemini: Access 1.5B tokens/day on Gemini 1.5 Flash/Pro.

↯ Mistral mistral grok rag+4
Meet Tiro! Agentic assisted memory retrieval and session state memory module. (www.reddit.com) +14 6w

A year ago, when I first got into LLMs, I started by using them to play D&D. ChatGPT 4o was surprisingly good at narration, improvisation, and keeping the game moving.

rag chatgpt agentic
My agent returns HTTP 200 but gives factually wrong answers. How are you catching this? (www.reddit.com) +14 6w

Working on a support agent and hit a gap I hadn't thought about. Agent completes successfully.

rag
Show HN: Nexa-gauge – Cache/cost-aware graph-based eval for LLM and RAG (github.com via hn) +1 6w

nexa-gauge - Graph-Based Evaluation for LLM and RAG Systems A cache-aware evaluation engine for measuring LLM and RAG output quality with repeatable metrics, cost estimates, and structured reports. Read the Documentation · Quickstart · CLI…

rag
RAG chatbot for internal ops docs. Anyone built something like this? (www.reddit.com) +13 6w

I run ops for a custom home builder. We have SOPs, HR policies, project checklists, and process docs...all living in Dropbox & I want to give my team a simple way to ask questions & get accurate answers without hunting through folders.

rag anthropic
Show HN: Build a custom AI in under 60 seconds (demo video) (www.youtube.com via hn) +1 6w

we've added onboarding that lets our users build custom AI for their website ready to deploy under 60s total. the platform consists end to end AI engineering, prompts, version control, evaluations, test cases, logs, AI Actions (custom tool…

rag
Show HN: I built a playground of interative A/B testing for RAG (rag-dr.hanhanwu.com via hn) +1 6w

To iteratively improve RAG performance, current evaluation solutions still take lots of manually work or lots of coding. And it requires close collaboration between AI engineers and domain experts (who may not know how to code).

rag
I built a WP plugin to solve the "AI Search" problem (YouTube-to-Blog and RAG) (www.indiehackers.com via hn) +1 7w

Hey IH, Like many of you, I’ve been watching traditional SEO traffic drop as Perplexity, SearchGPT, and Gemini Overviews take over. In 2026, if your content isn't being cited, it’s basically invisible.

rag gemini
PageIndex: Vectorless, Reasoning-Based RAG (github.com via hn) +1 7w

PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB ◦ No Chunking ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple…

rag mcp agentic
The RAG era is ending – a compilation-stage knowledge layer is what comes next (venturebeat.com via hn) +1 7w

The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next | VentureBeat Orchestration Infrastructure Data Security More Newsletters Featured The RAG era is ending for agentic AI — a new compilation-s…

rag agentic
Show HN: Memex, Claude memory via local RAG (MCP, offline embeddings) (memex-cli.vercel.app via hn) +1 7w

Local-first second brain with semantic search. Gives Claude persistent memory across conversations — all data stays on your machine.

rag mcp
Agentic RAG Explained in 3 Levels of Difficulty (machinelearningmastery.com via hn) +1 7w

In this article, you will learn what agentic RAG is, how it differs from traditional RAG, and when to use it. Topics we will cover include: The key limitations of traditional RAG pipelines and what agents add to address them.

rag agentic
Kvaser - Moving beyond simple agents: Building a Local-First AI Orchestrator with Qwen 3.6, Kiwix, and Wolfram (www.reddit.com) +1 7w

For the past two weeks, I’ve been spending 4–5 hours a day building a custom MCP (Model Context Protocol) orchestration server. What started as a simple experiment with Qwen 3.6 35B has evolved into a full-scale "Man-in-the-Middle" proxy t…

↯ Model Context Protocol ↯ Qwen 3.6 model-context-protocol rag qwen+1
How good is Gemini Embedding 001 for scientific retrieval? (www.reddit.com) +1 7w

How good is Gemini Embedding 001 for scientific retrieval (RAG application)? How does it compare against Text Embedding 3 Large?

rag gemini
Honestly, chunking is where most RAG systems quietly go wrong (www.reddit.com) +16 7w

Honestly, chunking is where a lot of RAG systems start lying to you while still looking fine in the demo. It works when the question is narrow and the document is basically prose, but once users ask messy real questions, the retrieval laye…

rag
Hello Guys. Quick Question On Research. (www.reddit.com) +11 7w

Looking for the people actually pushing on multi-agent architectures right now, not the N8N crowd. The progression I've been following: single chat → Claude Code → multi-file projects with context engineering → multi-agent systems → orches…

rag claude-code
Why we ended up with 4 agents and 3 protocols for agentic commerce on Shopware (www.reddit.com) +13 7w

Most agentic-commerce demos I see online are a single agent plus RAG over a product catalog. That shape works for a 200-SKU demo.

rag agentic
LangGraph and Cosmos DB: one back end for agents, memory, and RAG (devblogs.microsoft.com via hn) +1 7w

Build AI Agents and RAG Applications with the New LangChain + LangGraph Connector for Azure Cosmos DB Building AI agents and RAG applications today means stitching together half a dozen services, a vector database, a chat history store, a…

vector-database rag
EGA: Runtime Enforcement for LLM Outputs (v1.0.0) (www.reddit.com) +11 8w

I built EGA - a runtime enforcement layer for LLM outputs. The problem: eval tools score after the fact that something went wrong.

rag
Why is RAG evaluation so hard in the real world? (www.reddit.com) +11 8w

Evaluating RAG feels easy in theory, but production is a different challenge. We’ve been looking into why RAG benchmarking is such a moving target.

rag
I almost shipped OpenAI embeddings until an MTEB rank #130 model beat them by 11% (www.reddit.com) +12 8w

I just interviewed Michael Maximilien, former CTO at IBM and Chairperson of NodeJS Foundation, who spent a year shipping production RAG to multiple customers. His lesson was uncomfortable.

vector-database rag openai
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG (arxiv.org via hn) +1 8w

Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds o…

rag
Local query autocomplete with "classical" ML, no LLM needed (www.reddit.com) +1 8w

Hey guys! I know this is not fully LLM related (its still local though :D), mods feel free to delete this if you think its off topic, but I just wanted to share something I experimented with, local autocomplete without the use of LLMs or f…

rag
Should I continue to create my RAG project? (www.reddit.com) +14 8w

To preface this, I work in the oil field, I like to homelab as a hobby. But there is a lot of standards and policies that aren't always easy to find and look up.

openclaw rag
anyone else trying to pipe their own data into claude via mcp? (www.reddit.com) +13 8w

I'm trying to build a reliable local RAG setup for claude and it is just exhausting. I want claude to have access to my github repos and past project docs without me copy-pasting everything into the window every morning.

vector-database rag mcp
I finally sat down and did the math on my Cloud LLM bills… and I’m moving almost everything to a 4090. (www.reddit.com) +11 8w

I used to be all-in on cloud APIs. For any side project, I’d just grab an OpenAI or Anthropic key and not think twice.

rag openai anthropic
How are teams bridging the gap between company knowledge and AI agents? (news.ycombinator.com) +1 8w

AI agents are capable enough to automate real work now. But they keep failing because they don't know how a specific company actually operates.

rag
Show HN: MAItion – Open-source RAG with pluggable connectors and chat UI (github.com via hn) +1 8w

Hey HN, We wanted to share a new tool we’ve been working on. Even when documentation is well-structured, sometimes it’s hard to find what you need.

rag mcp
Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com) +1 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
Question: What are some useful content, web-scraping, web search tools, ingestion libraries, or MCPs for Karpathy's LLM Wiki? (www.reddit.com) +12 8w

Hey all, so I am currently exploring and playing around with Karpathy's LLM Wiki using Claude Code with Ollama and other routed models. I want to create some agents and provide them with tools/plugins, libraries, MCPs, or harnesses to assi…

ollama rag claude-code
Building a Full-Stack Agentic AI Platform (RAG + Orchestration + Governance) — feedback? (www.reddit.com) +12 8w

Hey folks 👋 I’ve been working on an AI agent platform called Noevex, focused on real production use—not just demos. In practice, AI systems struggle with: multi-step orchestration connecting multiple data sources controlling agent actions…

rag agentic
Why I’m still using RAG even with 2M context windows… (www.reddit.com) +12 8w

Look, when those 2 million-token context windows dropped earlier this year, I thought RAG was dead. I was like, “Why am I still chunking documents and building vector databases when I can just throw 50 PDFs into one prompt and be done?” So…

rag
Technical Overview of an AI RAG System with React, Python, Laravel, Redis (gist.io via hn) +1 8w

LongTerMemory: Technical Overview LongTerMemory is an AI-powered SaaS platform for exam preparation and long-term knowledge retention. It combines Retrieval-Augmented Generation (RAG) with spaced repetition scheduling to help users study s…

rag
Interactive playground to learn Agentic AI hands-on (Free) with Certification (www.reddit.com) +12 8w

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

↯ Fine Tuning ↯ Function Calling function-calling fine-tuning rag+3
I ran retrieval-auditor against LangChain's RAG quickstart, 5/6 flagged (github.com via hn) +1 8w

The corpus is Lilian Weng's "LLM Powered Autonomous Agents" — the blog post that the LangChain RAG tutorial uses as its canonical demo. The retriever is the LangChain default (cosine similarity over all-MiniLM-L6-v2 embeddings, top-5).

rag
After weeks of RAG setups, the bottleneck is the data pipeline, not the model (www.reddit.com) +11 8w

I spent weeks tuning retrieval models, then realized the real problem was getting sources into clean, structured, interlinked form. Scrape a webpage and you get a mess of HTML.

rag
Ask HN: How do you solve aggregation when agentic RAG breaks down? (news.ycombinator.com) +1 8w

I keep hitting the same failure mode with agentic RAG over collections of similar PDFs, like monthly electricity and gas bills from the same utility provider. It works well for retrieval: “Find my gas bill from January.” Though even there…

rag agentic
Show HN: Local RAG Pipeline with Weaviate and Ollama (www.storyblok.com via hn) +1 8w

i’ve been experimenting with building a fully local rag pipeline: weaviate for vectors + hybrid search, node.js scripts, qwen 3.5 on ollama what i found is that most of the challenges live in retrieval and chunking, not the LLM, and a good…

↯ Qwen 3.5 ollama rag qwen
Built a GraphRAG voice agent over JRCALC 2022 clinical guidelines using Gemini Live, part of a hackathon first-aid system for Meta Ray-Ban glasses (github.com via reddit) +11 8w

The voice guidance layer in our hackathon project uses a Gemini agent backed by a GraphRAG index over the JRCALC 2022 guidelines (the UK ambulance service clinical reference). When the system detects stroke signs or abnormal heart rate it…

rag gemini
Build your own voice assistant and run it locally – Whisper, Ollama, Bark (2024) (medium.com via hn) +1 8w

9 min read Mar 31, 2024 -- After my latest post about how to build your own RAG and run it locally. Today, we’re taking it a step further by not only implementing the conversational abilities of large language models but also adding listen…

ollama rag
Show HN: AI memory with biological decay (52% recall) (github.com via hn) +1 8w

Most RAG setups fail because they treat memory like a static filing cabinet. When every transient bug fix or abandoned rule is stored forever, the context window eventually chokes on noise, spiking token costs and degrading the agent's rea…

rag
Built a Legal RAG Chatbot for Indian lawyers covering BNS, BNSS, BSA and DPDP Act 2023 — Custom PageIndex + BERT + GPT-4o [Live Demo] (www.reddit.com) +11 8w

I ran a business for 12+ years. Traveling constantly.

rag agentic
Where is the boundary between a multi-agent and a monolithic AI agent structure? (www.reddit.com) +11 8w

Enterprise systems often avoid "monolithic" AI to prevent context rot and hallucinations. The standard fix is task-decoupling: splitting logic between specialized agents or deterministic code.

rag
LLM CTF challenges. Can you crack all 13? (wraith.sh via reddit) +1 8w

Wraith Academy is a free hands-on AI pentest curriculum — CTF challenges against live LLM agents covering prompt injection, tool abuse, data exfiltration, RAG poisoning, and more. Earn your WCAP certification.

↯ Security prompt-injection rag security
RAG pipelines, leaking PII into vector databases and nobody's talking about it (comply-tech.co.uk via hn) +1 9w

Your RAG Pipeline Is Leaking Customer Data Into Vector Embeddings If you're building a RAG (Retrieval Augmented Generation) system on internal documents such as customer support history, knowledge base articles, or internal comms, there's…

retrieval-augmented rag
I almost built RAG for my notes, then realized I didn't have a retrieval problem at all (www.reddit.com) +15 9w

My notes live in Obsidian. My reading and highlights live in Readwise.

vector-database rag gemini+1
5060ti + 32gb DDR4 (www.reddit.com) +16 9w

What models/quants have impressed you lately for 5060ti ? The use case is professional writing, RAG and long document summarization, not coding, so good instruction following and precision are a plus.

rag
RAG in Go: A Vulnerability Research Tool (www.ardanlabs.com via hn) +1 9w

Introduction In the previous post, you saw how you can use tools to add information to an LLM query. In this post, we’ll see another method of adding information to an LLM called RAG, or Retrieval-Augmented Generation.

↯ Security rag security
When the pronoun "they" breaks your RAG pipeline (old.reddit.com via hn) +1 9w

could not extract summary

rag
Edster – An open-source local AI agent with swarm mode and a web UI (github.com via hn) +1 9w

👾 Nedster CLI Coding Agent An unstoppable, fully local, open-source coding agent that runs on your consumer GPU. Tags: ollama coding-agent local-ai cli rag chromadb python qwen Are you trying to use local LLMs to autonomously write code, r…

ollama rag qwen
Show HN: DataFrey – MCP server for Snowflake with text-to-SQL agent (docs.datafrey.ai via hn) +1 9w

I’m a data scientist and I find it hard to use Claude Code for SQL - it doesn’t have DB context. so I made yet another database MCP server!

rag mcp claude-code
Combine persistant global Memory- and Task- management into one uniform system (www.reddit.com) +11 9w

rag agentic
Is there any way to implement multimodal RAG using some open-source multimodal large models? (www.reddit.com) +11 9w

↯ Qwen 3.6 rag
Show HN: Infrawise Azure Cloud Optimization (infrawiseai.com via hn) +1 9w

rag
Steno – Compressed memory with RAG for AI agents (github.com via hn) +11 9w

Steno Compressed memory notation with RAG retrieval for AI agents. Steno solves the AI memory problem: agents accumulate knowledge across sessions, but loading everything into context every time is expensive, noisy, and causes drift.

rag
Show HN: Corvi Careers – privacy first job search with resume matching (corvi.careers via hn) +1 9w

It lets you search 1M+ jobs across multiple regions, refine by keyword/category/location, upload a plain text resume for better matching, filter by target companies. Searches, keywords extracted from resume and bookmarks are saved locally,…

rag
Best way to prepare for AI Engineer interviews? (www.reddit.com) +14 9w

I’m currently preparing for AI-focused roles and would love to get perspectives from people already working in the industry. For context — I have ~5 years of experience as a Full Stack Engineer with a strong focus on AI systems.

↯ Llama 3.3 rag llama agentic
Sweet RAG Evil Model (www.reddit.com) +1 10w

Scenario A: Given: A search query to reduce context is provided When: Results are pushed to the system as completion. Then: a question will respond with accurte results Scenario B: Given: Scenario A data is in a slots KV Cache When: new se…

rag
I made an 80B local model ship a 295-test RAG codebas (github.com via hn) +1 10w

rag-workshop A local-first RAG system built autonomously by a multi-agent framework. This repository is the reference implementation produced by the C.E.H.

rag
Has anyone used Claude Opus 4.7 API on Qubrid or another platform? Use case? (platform.qubrid.com via hn) +1 10w

Advanced GPU infrastructure, collaborative AI Agents, and intelligent RAG systems. Build, deploy, and scale AI solutions with comprehensive tools.

↯ Opus 4.7 rag opus
Shopping assistant chatbot (www.reddit.com) +12 10w

I need to create an ecommerce shopping assitant chatbot. Customers would reach out via chat, and the agent/chatbot would help check inventory and make product recommendations based on what customers share.

rag
the shortest path to "Claude that actually knows what I did today" is one npx command (www.reddit.com) +12 10w

every other day someone here posts about karpathy's llm wiki idea, or "how do I give my agent context about me," or "I want a personal knowledge base my AI can use." and then the comments are always the same - build RAG, write a pipeline,…

rag mcp
Good multi-agent harness with db-based long term context? (www.reddit.com) +14 10w

I'm looking for suggestions for an agent harness that uses a database (SQLlite, RAG, what ever) for long-term context. I plan to use my RTX3080 & 3090 for local AI, though I expect to use APIs for some tasks.

rag
Show HN: GraphifyAI – Turn Any CSV/Excel into a Neo4j or LangChain Graph (graphify.midlantics.com via hn) +1 10w

Converting spreadsheets to graph databases (Neo4j, Neptune, etc.) usually means manually defining nodes, relationships, and writing Cypher from scratch. It's tedious.

rag
How to diagnose RAG failures from traces (www.siquick.com via hn) +1 10w

How to diagnose RAG failures from traces If a RAG system fails in production, the first question we should be asking is "what broke in this trace?". Until you can answer that, most scorers or dashboards aren't going to help you.

rag
CDRAG: RAG with LLM-guided document retrieval — outperforms standard cosine retrieval on legal QA (www.reddit.com) +16 10w

Hi all, I developed an addition on a CRAG (Clustered RAG) framework that uses LLM-guided cluster-aware retrieval. Standard RAG retrieves the top-K most similar documents from the entire corpus using cosine similarity.

rag
Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com) +11 10w

I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…

↯ Security red-team prompt-injection rag+1
Reports of RAG's death have been greatly exaggerated (atomicapp.ai via hn) +1 10w

Redirecting from /blog/llm-wiki-needs-a-substrate/ to /blog/rip-rag

rag
Two-Stage Semantic Chunking for RAG in Python (alessandrofuda.github.io via hn) +1 10w

Fixed-size chunking splits text at arbitrary token boundaries, cutting mid-sentence and blending unrelated topics into the same chunk. Here’s how to build a two-stage pipeline with LlamaIndex , structural splitting first, semantic coherenc…

rag
Whats the SOTA embedding model for arabic Language (www.reddit.com) +1 10w

Hello! I’m working on RAG system on arabic documents any idea on the best embedding model out there?

rag
Anyone here tried the "compile instead of RAG" approach? (www.reddit.com) +17 10w

Been seeing this idea where instead of doing the usual RAG loop, you compile all your sources into a markdown wiki first, then query that directly. The interesting part is that saved answers become part of the wiki too.

rag
Mitre ATLAS technique detection for LLM security in Rust (crates.io via hn) +1 10w

atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…

↯ Security prompt-injection rag security
Beginner in Langraph with no dev experience. How to build projects from scratch (www.reddit.com) +16 10w

Recently got recruited tin PwC post masters in data science. Interview was in traditional ml but now I must work in AI projects.

rag
Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors (arxiv.org) 9h

rag
MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG (arxiv.org) 9h

Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-team…

↯ Security prompt-injection rag security+1
Temporal Validity in Retrieval Memory: Eliminating Stale-Fact Errors for AI Agents over Evolving Knowledge (arxiv.org) 9h

Retrieval-augmented generation (RAG) gives agents access to accumulated knowledge, but has no model of time. When a fact changes (e.g., a function is renamed or API restructured), RAG retrieves both the stale and current value with near-id…

rag
MKG-RAG-Bench: Benchmarking Retrieval in Multimodal Knowledge Graph-Augmented Generation (arxiv.org) 9h

Retrieval-augmented generation (RAG) over knowledge graphs has emerged as a promising approach for grounding large language models, yet existing benchmarks largely overlook the challenges of retrieval in multimodal knowledge graph RAG (MKG…

rag
Fast medical RAG API to give your local LLMs access to facts (www.reddit.com via reddit) 18h

I created a simple RAG API using medical Wikipedia articles that you can point your agent to and use freely. It may be useful in allowing your local LLMs access to medical facts they might not be able to recall from their weights.

rag mcp
Is there a standard for porting agent state across models, or are we all writing custom wrappers? (www.reddit.com via reddit) 19h

Hey everyone, I'm fairly new to the agentic workflows space. Really interested to get into it.

rag agentic
Verifiable Manifest Signing and Transparency Enforcement for Secure MCP-Based LLM Pipelines (arxiv.org) 1d

Large Language Models (LLMs) are increasingly deployed in tool-driven environments such as healthcare analytics, financial systems, retrieval-augmented generation (RAG), and multi-agent workflows. Although the Model Context Protocol (MCP)…

↯ Model Context Protocol model-context-protocol rag mcp
Epistemic Bias Injection: Manipulating LLM Opinion via Selective Context Retrieval (arxiv.org) 1d

When answering user queries, LLMs often retrieve knowledge from external sources stored in retrieval-augmented generation (RAG) databases. These are often populated from unvetted sources, e.g.

rag
CausalRAG2: Hierarchical Causal Knowledge Graph Design for RAG (arxiv.org) 1d

Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based me…

retrieval-augmented rag
Is GraphRAG Needed? From Basic RAG to Graph-/Agentic Solutions with Context Optimization (arxiv.org) 1d

As advanced RAG variants like GraphRAG and Agentic RAG emerge, one leading question is when and how to use them. Here, we introduce a framework for different RAG scenarios evaluation and comparison on semi-structured knowledge bases, inclu…

rag agentic
Memory Makes the Difference: Evaluating How Different Memory Roles Shape Conversational Agents (arxiv.org) 1d

Prior research on memory mechanism in RAG-based conversational system has emphasized how memory is stored and retrieved. However, far less is known about how memories with different functional roles influence response quality.

rag
To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG (arxiv.org) 1d

Multi-agent document assessment for retrieval-augmented generation is computationally expensive, driving practitioners toward smaller, deployable models whose assessment mechanisms remain poorly understood. We conduct a controlled study of…

rag
Project depository attachments incosistency (www.reddit.com via reddit) 1d

so you have .zip as available in Projects depository upload file attachments but Claude doesn't get it. How am I to upload whole code subdirectory structure if Project depository doesn't have any folders etc?

rag
How to set Claude to treat uploaded project files only as RAG, and not as context? (www.reddit.com via reddit) 2d

When uploading files to Claude projects, Claude says it automatically decides to use them as context or as RAG, depending on computes exhausted. I prefer Claude not to switch between context and RAG automatically.

rag
Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity (arxiv.org) 2d

Retrieval-Augmented Generation enhances large language models by incorporating external knowledge, but deploying it in sensitive scenarios risks privacy leakage via malicious prompts. To address this, we propose a multi-agent framework tha…

rag
MMed-Bench-IR: A Heterogeneous Benchmark for Multilingual Medical Information Retrieval (arxiv.org) 2d

Retrieval-augmented generation (RAG) in clinical settings increasingly requires multilingual retrieval against predominantly English evidence corpora. Multilingual medical retrieval demands three capabilities: cross-lingual alignment, conc…

rag
Quantifying Prior Dominance in RAG Systems (arxiv.org) 2d

Retrieval-Augmented Generation (RAG) grounds Large Language Models in external knowledge, yet current evaluations rely on discrete heuristics that suffer from ''epistemic blindness'' - failing to distinguish genuine contextual information…

rag
A clean breakdown of RAG vs MCP architectures for AI Agents (www.reddit.com via reddit) 2d

↯ Model Context Protocol vector-database model-context-protocol rag+1
RAVEN: Agentic RAG for Automated Vulnerability Repair (arxiv.org) 3d

↯ Security rag security agentic
Point-in-Time Financial RAG with Frozen LLMs and Market-Feedback Adaptive Retrieval (arxiv.org) 3d

rag
Look Before You Zoom: Adaptive Routing for the Resolution-Context Trade-off in Visual RAG (arxiv.org) 3d

rag
Fixed RAG Compression Collapses Measured Reader Scaling (arxiv.org) 3d

rag
Dissecting Agentic RAG: A Component Ablation for Multi-Hop QA with a Local 7B Model (arxiv.org) 3d

rag agentic
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage (arxiv.org) 3d

rag
From RAG to Agentic RAG for Faithful Islamic Question Answering (arxiv.org) 3d

rag agentic
Tell Me: An LLM-powered Mental Well-being Assistant with RAG, Synthetic Dialogue Generation, and Agentic Planning (arxiv.org) 3d

rag agentic
When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG (arxiv.org) 3d

rag
Only Ask What You Don't Know: Grounded Delta Planning for Efficient Multi-step RAG (arxiv.org) 3d

rag
$\pi$-RAG: Oblivious Retrieval via Semantic Quantization and Transcendental Addressing for Large Language Models (arxiv.org) 3d

rag
The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications (arxiv.org) 3d

Document-grounded assistants built on large language models are increasingly used in high-stakes, knowledge-intensive work. Their usefulness, however, may depend on how evidence is allocated before generation.

rag
Ghost Vectors: Soft-Deleted Embeddings Remain Reconstructible in HNSW Vector Databases (arxiv.org) 3d

Retrieval-augmented generation (RAG) allows large language models to access external and private corpora for factual, domain-specific responses. Modern RAG pipelines use hierarchical navigable small world (HNSW) vector databases for effici…

rag
Project knowledge full/exceeded on desktop/web but mobile is fine? Only able to send messages from my phone (www.reddit.com via reddit) 5d

I have a Claude project with 16, 500+ page historical documents in it. On my phone, this is not a problem and it uses the RAG project context system very effectively.

rag
A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots (arxiv.org) 7d

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete. Input filter…

↯ Security prompt-injection rag security
CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges (arxiv.org) 7d

Online hate speech and misinformation frequently overlap, yet NLP research has mainly treated them in isolation. While LLMs represent a scalable solution for assisting humans in the generation of counterspeech for both threats, zero-shot m…

rag
When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented Generation (arxiv.org) 7d

Streaming Retrieval-Augmented Generation (Streaming RAG) reduces user-perceived latency by issuing tool queries in parallel with ongoing user input, before the utterance is complete. Reported gains are aggregate, yet the mechanism's benefi…

↯ Tool Use tool-use rag
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference (arxiv.org) 7d

Retrieval-Augmented Generation (RAG) improves factual grounding, but it also lengthens prompts and raises prefill cost. Prefix caching in serving engines such as vLLM reduces this cost only when requests share the same token prefix.

vllm rag
AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models (arxiv.org) 7d

We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often r…

rag agentic
Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why (arxiv.org) 7d

Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generati…

rag agentic
Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems (arxiv.org) 8d

Injecting malicious knowledge into retrieval-augmented generation (RAG) systems can manipulate retrieved evidence and mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection attacks mainl…

rag
SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG (arxiv.org) 8d

Retrieval-augmented generation (RAG) systems must balance retrieval granularity with contextual coherence, a challenge that existing methods address through LLM-guided chunking, single-level context expansion, or hierarchical summarization…

rag
PACE-RAG: Patient-Aware Contextual and Evidence-Constrained RAG for Clinical Drug Recommendation (arxiv.org) 9d

Drug recommendation requires a deep understanding of individual patient context, especially for complex conditions like Parkinson's disease. While LLMs possess broad medical knowledge, they fail to capture the subtle nuances of actual pres…

rag
MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation (arxiv.org) 9d

While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines oft…

rag
MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry (arxiv.org) 10d

Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural lang…

rag
Not All Retrievals are Useful: Cross-Attention for Input-Aware RAG in Time Series Forecasting (arxiv.org) 10d

Retrieval-augmented generation (RAG) enhances zero-shot time series (TS) forecasting by leveraging external knowledge bases, yet existing approaches overlook input-level relevance when fusing retrieved samples with the query. We argue that…

rag
SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG (arxiv.org) 10d

Fixed-length chunking in Retrieval-Augmented Generation (RAG) often leads to boundary fragmentation, where critical evidence is split across segments, degrading retrieval recall. While static windowing and parent retrieval improve recall,…

rag
SAG: SQL-Retrieval Augmented Generation with Query-Time Dynamic Hyperedges (arxiv.org) 10d

Retrieval-Augmented Generation (RAG) offers an effective approach for large language models to access external knowledge. However, existing methods rely on dense similarity retrieval and face inherent limitations in handling structured con…

retrieval-augmented rag
TechRAG: Evidence-Gated Multimodal Agentic RAG for Technical Literature Reasoning (arxiv.org) 10d

This paper presents an agentic multimodal retrieval-augmented generation (RAG) framework for domain-specific literature reasoning, instantiated on a curated corpus of several thousand papers in intelligent tires, vehicle dynamics, vehicle…

rag agentic
When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs (arxiv.org) 10d

While Retrieval-Augmented Generation (RAG) is one of the dominant paradigms for enhancing Large Vision-Language Models (LVLMs) on knowledge-based VQA tasks, recent work attributes RAG failures to insufficient attention towards the retrieve…

rag
SPI: Query-Depth-Adaptive Indexing for Streaming RAG in Vector Databases (arxiv.org) 10d

Vector databases (VecDBs) are increasingly deployed in retrieval-augmented generation (RAG) pipelines where query processing and document ingestion occur concurrently. The index layer needs to provide low-latency search while incorporating…

rag
MAGE-RAG: Multigranular Adaptive Graph Evidence for Agentic Multimodal RAG in Long-Document QA (arxiv.org) 10d

Long-document multimodal question answering requires a system to locate sparse evidence in long PDFs and integrate clues from text, tables, images, charts, and complex layouts. Existing RAG methods mostly rely on fixed Top-k retrieval over…

rag agentic
Combining Retrieval-Augmented Text Generation with LLMs for Reading Content Recommendations (arxiv.org) 10d

This work presents the design, implementation, and evaluation of a system for generating personalized reading content using Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG). The proposed architecture consists…

rag
CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation (arxiv.org) 10d

Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language…

rag
What I learned writing an eval harness for my own SKILL.md files (it caught two real bugs) (www.reddit.com via reddit) 11d

I spent two months writing a Claude Code skill pack that enforces methodological rigor on RAG, agent, and MCP server work. Last week I built a test harness for it.

rag mcp claude-code
Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression (arxiv.org) 11d

Retrieval-augmented generation (RAG) often suffers from long and noisy retrieved contexts. Existing context compression methods typically rely on heuristic relevance estimation or supervised compression models rather than on how LLMs utili…

rag
How are you preventing Claude Code from using outdated API documentation? (www.reddit.com via reddit) 12d

I've been using Claude Code more heavily for API integrations and one recurring issue is that it'll occasionally generate code against outdated documentation. The implementation itself is often good, but sometimes: endpoints have changed p…

rag mcp claude-code
Most “AI memory” is RAG with better marketing. I built one that actually forgets (www.reddit.com via reddit) 12d

Most AI “memory” tools never forget anything, and they sell that as the feature. It’s the bug.

rag claude-code
NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems (arxiv.org) 2w

Accurately assessing model confidence is essential for deploying large language models (LLMs) in mission-critical factual domains. While retrieval-augmented generation (RAG) is widely adopted to improve grounding, confidence calibration in…

rag
RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery (arxiv.org) 2w

Retrieving the biological impacts of protein-protein interactions (PPIs) is essential for target identification (Target ID) in drug development. Given the vast number of proteins involved, this process remains time-consuming and challengin…

rag
X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation (arxiv.org) 2w

Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence may suppo…

rag
SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings (arxiv.org) 2w

Large language models (LLMs) are increasingly used to access organisational documentation, including standard operating procedures (SOPs), HR policies and institutional guidelines. However, retrieval-augmented generation (RAG) systems that…

↯ Hallucination hallucination rag
How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation (arxiv.org) 2w

Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present HieraRAG,…

rag
When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) extends large language models (LLMs) beyond parametric knowledge, yet it is unclear when iterative retrieval-reasoning loops meaningfully outperform static RAG, particularly in scientific domains with m…

rag
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning (arxiv.org) 2w

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a se…

↯ Fine Tuning fine-tuning rag
SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems (arxiv.org) 2w

Retrieval-augmented generation (RAG) agents increasingly run with persistent memory that accumulates across user sessions. This creates a new attack surface: an adversary interacting only through normal channels can inject crafted memories…

rag
Uncertainty-Aware Hybrid Retrieval for Long-Document RAG (arxiv.org) 2w

Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worse…

retrieval-augmented rag
Rethinking RAG in Long Videos: What to Retrieve and How to Use It? (arxiv.org) 2w

Retrieval-augmented generation is moving beyond text into long, egocentric video, where systems must select query-relevant chunks across multiple modalities and temporal granularities. Yet progress in VideoRAG is limited by two gaps: exist…

rag
I built a graph-memory layer on top of turbovec for local/constrained RAG — looking for feedback (www.reddit.com via reddit) 2w

Disclosure: I built this. I like turbovec for compact local vector search, but in real RAG apps my bottleneck was often outside the vector index: tenant filters, source/time/tag constraints, graph neighborhoods, BM25 candidates, rerank, an…

rag
Need help to build an internal knowledge portal for sales (www.reddit.com via reddit) 2w

Hello, I’m looking to build (or buy, if a perfect solution exists) an intelligent Internal Knowledge Portal / Sales Enablement Assistant for our sales and pre-sales teams. The Core Vision The goal is to move past simple "Ctrl+F" search and…

rag
Better to obsess over an Agent Wiki than complex Agent frameworks (www.reddit.com via reddit) 2w

It is exhausting to see agent frameworks dropping every week that feels incomplete. you build a custom setup today and someone suggests a better one tomorrow.

vector-database rag
The voice layer for AI agents feels underrated (www.reddit.com via reddit) 2w

Most AI agent demos focus on planning, tool use, browser automation, memory, RAG, or multi-agent workflows. But I keep running into a smaller problem at the end of the pipeline: What happens when the agent output needs to become audio?

↯ Tool Use tool-use rag
What's the best way to learn RAG for real-world applications? (www.reddit.com via reddit) 2w

I've noticed many AI courses explain vector databases but not complete RAG systems. The Knowledge Base RAG module on SimplAI University appears to focus on building retrieval-powered AI experiences.

rag
Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) is the current industry standard for grounding AI in real-world facts. Traditional retrieval methods rely on keyword matching and topic proximity, ranking content based on how closely it sounds like the…

rag
uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking (arxiv.org) 2w

This report describes our participation in SemEval-2026 Task 8 on multi-turn retrieval and question answering. The task evaluates conversational systems across four domains (finance, cloud documentation, government, Wikipedia), and include…

rag
When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval (arxiv.org) 2w

Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually incorrect c…

rag
Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use, but the…

rag
EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA (arxiv.org) 2w

Standard Retrieval-Augmented Generation (RAG) pipelines route every query through retrieval and generation unconditionally, incurring unnecessary computation and propagating low-quality context to the generator. We introduce EverydayGPT, a…

rag
ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) improves large language model applications by grounding generation in retrieved evidence, but also introduces corpus poisoning as a new attack surface. In this setting, an adversary injects or edits pas…

rag
NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track (arxiv.org) 2w

We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather than target…

rag
The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content (arxiv.org) 2w

Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention distribution. We i…

rag
Could Fable 5 one shot entreprise RAG system? (www.reddit.com via reddit) 2w

Hey ! For the ones who tried Claude Fable 5, do you think it could make a big entreprise RAG System?

rag codex
Audio-first, deep-dive RAG Masterclass on YT it's called "Master RAG while you sleep" (www.youtube.com via reddit) 2w

Here is a fully cleaned, simplified version. It drops all the "pro" marketing phrases like “masterclass,” “built for you,” and “I’ve put together,” and replaces the heavy academic jargon with straightforward engineering terms.

rag
Looking to Join an Anthropic Partner Organization for Claude Certified Architect (www.reddit.com via reddit) 2w

Hi everyone, I'm interested in taking the Claude Certified Architect certification and am looking for a legitimate path to join or collaborate with an Anthropic partner organization. My background is AI architecture, automation engineering…

rag anthropic
Transitioning into AI Engineering Roadmap? (www.reddit.com via reddit) 2w

I'm a backend/full-stack developer looking to transition into AI Engineering roles (LLM Engineer, Generative AI Engineer, AI Agent Developer). I already know Python and have experience building WebApps, APIs, databases, and backend systems.

rag
Deploy a Qwen 3.6 Agentic RAG — Step-by-Step Walkthrough (medium.com via reddit) 2w

Deploy an Agentic RAG powered by Alibaba’s latest Qwen 3.6, running fully on your machine.

↯ Qwen 3.6 rag qwen agentic
Skills vs RAG (www.reddit.com via reddit) 2w

In 2025, I built an informational chatbot using a RAG pipeline, and it works quite well across an extremely large set of documents/types in my knowledge base. I am debating if any parts can be optimized by using Skills or if it makes sense…

rag
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing (arxiv.org) 2w

rag
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG (arxiv.org) 2w

rag
RAG over Thinking Traces Can Improve Reasoning Tasks (arxiv.org) 2w

rag
Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis (arxiv.org) 2w

rag agentic
MetaPlate: Counterfactual-Guided RAG-LLM Tool for Personalized Food Recommendation and Hyperglycemia Prevention (arxiv.org) 2w

rag
RAG: Is it relevant for Agents (www.reddit.com via reddit) 2w

I keep hearing varying opinions about the usefulness of RAG for Agents. Some are saying Markdown files supported by orchestration engines like OpenClaw is enough.

openclaw rag
The reason your AI agent keeps failing has nothing to do with the model (www.reddit.com via reddit) 2w

I've spent the last 8 months building AI agents. Research agents, competitive intel agents, RAG pipelines, you name it.

rag
How are you handling aggregation/counting questions in doc-aware agents? RAG keeps failing me here (www.reddit.com via reddit) 2w

Something I keep hitting building agents that work over documents, curious how others solve it. RAG is the default doc tool we give agents, and it's great for "find/explain the passage about X" — the answer lives in one place, retrieval fi…

rag
HOW much llm context does an agents need (www.reddit.com via reddit) 2w

Does it depennds on the llm or the agent (RAG) capabilities , like i want to do an experiment with a very small language model with ok rag like few functionalities i hope someone is has this idea . the idea is to run those agents in mobile…

rag
Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG (arxiv.org) 2w

rag
The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection (arxiv.org) 2w

rag
Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries (arxiv.org) 2w

rag
Evaluating RAG Reliability under Clean, Misleading, and Mixed Retrieval (arxiv.org) 2w

rag
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG (arxiv.org) 2w

rag agentic
DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking (arxiv.org) 2w

rag
Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era (arxiv.org) 2w

rag
Harmonia: End-to-End RAG Serving Optimization (arxiv.org) 2w

rag
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems (arxiv.org) 2w

rag agentic
SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) injects LLM queries with relevant documents to improve response quality. This injection increases prompt length and slows time to first token (TTFT).

rag
Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents (arxiv.org) 2w

Retrieval-augmented generation (RAG) enables agents to access external knowledge at inference time, but it primarily retrieves fragmented declarative evidence, leaving agents to repeatedly infer task procedures from passages, manuals, exam…

rag
Using Claude as a deterministic metric engine via Postgres queues. Anyone doing this? (www.reddit.com via reddit) 2w

I've been working on turning unstructured field data into calibrated metrics. Instead of normal RAG, I built a system where AI agents act as a metric engine.

haiku rag sonnet
How do you pull an entry level job/ freelance? (www.reddit.com via reddit) 2w

Hey everyone, I’m a self-taught Python developer transitioning into AI Integration and Database Automation. For those who started out self-taught in automation/AI integration: - What was your fastest route to finding your first freelance o…

rag mcp
Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong? (www.reddit.com via reddit) 2w

Hello everyone I am totally new to LocalLLMs and only used chatGPT/Claude/NotebookLM before. So bear with me 😃 I'm an attorney and would like to analyze and summarize case files locally for privacy/confidentiality reasons.

↯ Qwen 3 rag qwen chatgpt
Need a Production-Level RAG AI Agent Tutorial (www.reddit.com via reddit) 2w

Can anyone suggest a Production-Level RAG AI Agent tutorial (YouTube video, documentation, course, GitHub repo, etc.)? My goal is to build a project that is actually worth adding to the Projects section of my resume for AI Engineer roles.

rag
SEEK: Steering LLM Reasoning for RAG via Internal Reasoning Sketches (arxiv.org) 2w

rag
HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG (arxiv.org) 2w

rag
TA-RAG: Tone-Aware Retrieval-Augmented Generation for Peer-Support Health Communication (arxiv.org) 2w

rag
Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking (arxiv.org) 2w

rag
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts (arxiv.org) 2w

rag
Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection (arxiv.org) 2w

Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationsh…

↯ Hallucination hallucination rag
RAG for you see it live open source files any kind (www.reddit.com via reddit) 2w

This is for visualizing file extraction through RAG (or file ingestion into any structured data set). I've been really into different shapes of data (like graph db, etc).

rag
RAG visualizer open source (www.reddit.com via reddit) 2w

This is for visualizing file extraction through RAG (or file ingestion into any structured data set). I've been really into different shapes of data (like graph db, etc).

rag
I built an AI support agent where the main metric is unsafe auto-action rate, not just accuracy (www.reddit.com via reddit) 2w

I built a production-shaped AI customer support agent for telecom, and the biggest lesson was that classifier accuracy is not enough. I recently finished RelayOps v1.2, a telecom/subscription customer-support agent built as a vertical slic…

rag
How are you actually using AI on large construction projects? (www.reddit.com via reddit) 2w

I've spent several years in project management for large oil / gas / refinery projects, working on the contractor side. With AI dominating the conversation these days, I've been using platforms like Claude Code in my off-hours, and it's st…

rag claude-code
Alternatives to ChromaDB for easy RAG search (www.reddit.com via reddit) 2w

I'm disappointed that ChromaDB's local, free "single node" version is still getting second-class, hand-me-down features while the "distributed" version (a SaaS offering, unsurprisingly) gets built in hybrid search, BM25, etc. I tried to gi…

rag
Building a Claude-certified developer network: looking for builders to join (free certification path) (www.reddit.com via reddit) 2w

[Update] Wow, 32 sign-ups already, thank you all! Still plenty of room (we're aiming min.

rag anthropic
Hey I want to be able to build and optimize agent? Any recommandation about how to learn? (www.reddit.com via reddit) 2w

I want to learn how to build an agent and I can then try to optimize or be creative about it. This include something like (RAG, Embedding, Skills, MCP, subagent isolation, context window, memory, Harness etc.) I want to learn but resources…

rag mcp
I built a RAG system for the first time. Here's what nobody told me would be the hard part (www.reddit.com via reddit) 2w

Had been reading about RAG for months before I actually built one. Every explanation made it sound straightforward.

vector-database rag
Guidance please (www.reddit.com via reddit) 2w

I need help . pls help !

rag
IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval (arxiv.org) 3w

Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate time with…

rag
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation (arxiv.org) 3w

Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augmented Generation (RAG). However, its br…

rag
HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation (arxiv.org) 3w

Embedding geometry plays a fundamental role in retrieval quality, yet dense retrievers for retrieval-augmented generation (RAG) remain largely confined to Euclidean space. However, natural language exhibits hierarchical structure from broa…

retrieval-augmented rag
A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (arxiv.org) 3w

Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlen…

rag agentic
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface (arxiv.org) 3w

Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses. While RAG has shown st…

↯ Security rag security
Agent-Orchestrated Adaptive RAG: A Comparative Study on Structured and Multi-Hop Retrieval (arxiv.org) 3w

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding their responses in external knowledge, but conventional pipelines rely on static, single-step retrieval that limits performance on complex queries. Thi…

rag
Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents (arxiv.org) 3w

LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by semantic…

rag
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving (arxiv.org) 3w

Retrieval-augmented generation (RAG) improves large language model (LLM) answer quality by grounding generation in external evidence, but processing retrieved contexts makes the prefill stage a dominant serving cost. RAG cache fusion reduc…

rag
FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG (arxiv.org) 3w

When retrieved evidence contradicts parametric memory, language models frequently ignore context and default to memorized priors -- a failure that undermines the core purpose of retrieval augmentation. Contrastive decoding amplifies the co…

rag
Answer Presence Drives RAG Rewriting Gains (arxiv.org) 3w

Retrieval-augmented QA pipelines often route retrieved passages through an LLM \emph{rewriter} before a smaller reader, lifting F1 by tens of points on multi-hop benchmarks; this gain is typically credited to improved evidence quality. We…

rag
"We didn't know what YCombinator was 5 months ago. Last week Garry Tan asked us to take down what we built." (www.reddit.com) 11 4w

5 months ago, i didn't know what YCombinator was. Last month, the president of YC noticed what we built.

rag
I made a small tool to inspect retrieval results before feeding them into RAG (www.reddit.com) 5 4w

I’ve been messing around with live web retrieval for RAG, and the part that kept annoying me wasn’t the search call itself. It was figuring out whether the returned results were actually usable as evidence.

rag
2 years of work, 8 iterations and we have waited to introduce our Product Alexandria so long. Its a cursor or claude code for your daily office life! (www.reddit.com) 1 4w

We spent 2 years testing whether “vibe engineering” could become real Hey everyone, We’re a small team of 3 brothers + our father as senior advisor. Background: 2 mechanical engineers 2 construction engineers father with 30+ years in const…

rag cursor claude-code
Cost of Using LLMs in Agentic AI and RAG workflows (www.reddit.com) 1 5w

Hey Everyone ML engineer and Researcher here I’ve been researching production issues in Agentic AI + RAG systems and one pattern keeps showing up repeatedly: Context inefficiency. Not just retrieval quality — but the actual economics and s…

rag agentic
Building Expertise in Claude - Seeking Quality Learning Resources (www.reddit.com) 13 5w

Hi everyone, I'm on a mission to become a serious expert in Claude and AI, and I'm building a structured learning path. I want to create content that's actually valuable - with real practical applications, not surface-level tutorials.

↯ Function Calling function-calling rag anthropic
I need HELP with a document classification task (www.reddit.com) 6 5w

Hey everyone, my company's tasked me with building a document classification system, insurance documents specifically. someone dumps a batch of documents, and the system needs to classify and label each one correctly.

rag
A Small Site That Explains LLM/Agents Without the Hype (100% free, no sign up required) (www.reddit.com) 6 5w

I am a PhD student at UofToronto doing agent research. Seen a lot of hype around this topic which get people (especially non-tech) hella confused.

rag
Are smaller local models improving faster where it actually matters? (www.reddit.com) 5 6w

From the inference side, it’s been interesting seeing how often smaller open models end up staying in active use simply because they’re fast enough to constantly interact with. A year ago a lot of these models felt more like demos or side…

rag
"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support" (huggingface.co) 6w

"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support" - user: oncoagent-research tags: - oncology - multi-agent - LangGraph - RAG - QLoRA - AMD - open-source - clinical-ai - healthcare Onc…

rag
I got tired of the API bills for 100k+ context windows, so I built a persistent O(1) semantic memory state engine to compress history (www.reddit.com) 2 6w

Hey everyone, The entire industry right now is cheering for massive 1M+ context windows, but I think it's fundamentally the wrong approach. "Just add more RAM" is a trap.

rag
You don't need a GPU server to run Claude agents (www.reddit.com) 3 6w

I’ve been seeing a lot of newcomers asking about hardware specs lately, and there’s this weirdly common myth that you need a heavy server or a GPU instance to run Cla͏ude-based agents. You really don’t.

ollama rag
My Mac Mini kernel-panicked twice. Turned out MCP servers were eating 1.5 GB at idle, leaving no headroom for anything else. So I built a process supervisor (www.reddit.com) 7w

tl;dr (Claude caveman edition): MCP servers sit around doing nothing, eat 1.5 GB. Machine angry.

rag mcp
Sentient OS: I spent a year hacking MLX and doing surgery on Qwen to process 3,000 screenshots overnight on a 6 year old iPhone. Every optimization explained :D (www.reddit.com) 3 7w

hey localllama :) I got a multimodal vision LLM to process 3,000 screenshots overnight on a 6 year old iPhone -- entirely on-device. below is every hack, surgery, and optimization i built over the past year to make this possible!

rag qwen mcp
Macbook M3 MAX 64 vs M5 PRO 48, or wait for spark/studio (www.reddit.com) 9 8w

I’m choosing between two refurbished MacBooks, both around $3,100. Option 1: 14” M3 Max, 16-core CPU / 40-core GPU, 64GB RAM, 1TB SSD.

rag
I stopped writing 500-word guardrail prompts. This 8-line template works better. (www.reddit.com) 3 8w

I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." I…

↯ Security ↯ Hallucination ↯ Jailbreak jailbreak hallucination rag+1
I want to create and maintain a set of benchmarks for local LLMs. Would anyone pay/donate for this? (www.reddit.com) 9 8w

Please help me build some clarity. I want to participate in local LLMs ecosystem more.

↯ Qwen 3.5 rag llama
The Claude Code Pro removal is getting framed as 'just go local' but for production systems it's messier (www.reddit.com) 6 9w

Yesterday's Claude Code Pro removal thread hit 350+ comments in a few hours, and the dominant take was basically "switch to Kimi K2.6, go local, done." I upvoted that thread and tbh im mostly there — but im building voice agents and RAG pi…

↯ Qwen 3.6 rag qwen anthropic+1
Best open-source tools for prompt injection defense in 2026 (www.reddit.com) 9w

Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…

↯ Security prompt-injection rag security+2
Need a MVP for a RAG, rent Hardware for short term (www.reddit.com) 4 9w

I am working in an MVP for a small RAG, just to show what is possible. I currently do not have appropriate hardware, so I need to rent something for a short period.

rag
Most AI agents have amnesia. I built one with a wiki-based memory that compounds over time. (www.reddit.com) 2 9w

rag
Jarvis — Your Personal AI Companion (www.reddit.com) 5 9w

openclaw rag
What I learned improving LoCoMo retrieval from 89.6% → 93.9% (www.reddit.com) 10w

Spent the last few weeks measuring how far you can push conversational memory retrieval without any LLM calls. Sharing what worked on LoCoMo (Snap Research's 1982-question benchmark over 10 long conversations) in case it's useful to others…

rag
tested async performance across LangChain, LlamaIndex, and Haystack under concurrent load. The results were worse than I expected and here's what we found. (www.reddit.com) 10w

Been running LLM pipelines in production for a while. Kept noticing throughput numbers that didn't make sense for "async" code.

rag
Why are so many Creating "local Chat" inference models? (www.reddit.com) 15 10w

I'm a novice but so confused by the tech driving the tech. Whats the use cases that are being driven by so many spending on 20K local modelling hardware, that cant compete with the pending dramatic decrease in cost per token let alone the…

rag
Running on cpu :( (www.reddit.com) 3 10w

I am in the midst of a POC project at work and am I have is 4 AMD Epyc cores and those are essentially virtualized. Does any one have any tricks?

↯ Mistral mistral ollama rag+1
Need your help — creating a 2 min RAG video for a DevRel interview, what would actually be useful to you? (www.reddit.com) 10w

Hey everyone, I am going through an interview for a developer relations role and part of the process is creating a short two minute technical video on RAG aimed at senior developers. I have been building with tools like Lovable, Bolt, Repl…

rag
How are you feeding personal context to your local models? (www.reddit.com) 1 10w

I've been running Mistral/Llama locally through Ollama for a while now and the thing that keeps bugging me is context. The model itself is fine for general stuff but the second I want it to know about my projects, my notes, or files it doe…

↯ Mistral mistral ollama rag+2
GLM OCR for Arabic (www.reddit.com) 2 10w

So, I have been testing GLM OCR for my rag app, but it is not working good for Arabic. It is unable to extract data either on textual page, scanned pages or even images.

↯ Glm glm rag
I’m looking for advice on setting up a local AI model that can generate Word reports automatically. (www.reddit.com) 3 10w

Hi everyone, I’m looking for advice on setting up a local AI model that can generate Word reports automatically. I already have around 500 manually created reports, and I want to train or fine-tune a model to understand their structure and…

rag
MCP servers vs Agent Skills: I think most people are comparing the wrong things (www.reddit.com) 9 11w

I keep seeing people compare MCP servers and Agent Skills as if they’re alternatives, but after building with both, they feel like different layers of the stack. MCP is about access.

rag mcp claude-code
Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge (huggingface.co) 86w

rag
Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon (huggingface.co) 111w

rag

← all tags