Earlier I posted in a “Who wants to be hired?” thread, looking for a place where I could apply my experience in hospitality, food tech and automation. A couple hours later I received an email: “Hi Ilia, I saw your comment on the June Who’s…
#rag
370 items
Please don't spam people looking for employment. It's just cruel (news.ycombinator.com) Vibe Coding vs. Production reality (www.reddit.com) The image is from X, been thinking about it since I saw it. Vibe coding is real.
Taught my 60-year-old dad (zero coding exp) Claude and Git in Feb. Today he built a RAG solution. I finally get "vibe coding." (www.reddit.com) My father teaches geology and has literally zero coding expertise. Back in February, I introduced him to Claude and taught him the absolute basics of how Git works.
Hot take: the biggest bottleneck in AI agents right now isn't models, frameworks, or even cost. It's that nobody knows how to properly evaluate if their agent is actually working (www.reddit.com) How do you usually get around when starting big projects in Claude Code? (www.reddit.com) This question will probably make more sense when I explain my current situation: lately I’ve been doing some small projects here and there to some small business in my town and they have been working fine, but that is about to change. I ma…
Title: I’m tired of the "Agent Hype"—Most AI agents right now are just expensive loops. Change my mind (www.reddit.com) We’ve all seen the flashy demos, but after spending the last few months trying to build [or use] actual multi-agent workflows, I’ve hit a wall. The "Loop of Death": Agents still get stuck in reasoning loops that burn tokens without solving…
Qwen 3.6: worse adherence? (www.reddit.com) Just swapped Qwen 3.5 for the 3.6 variant (FP8, RTX 6000 Pro) using the same recommended generation settings. My stack is vLLM (v0.19.0) + Open WebUI (v0.8.12) in a RAG setup where the model has access to several document retrieval tools.
Gemini API File Search is now multimodal (blog.google via hn) Gemini API File Search is now multimodal: build efficient, verifiable RAG Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata.
What's your favorite local MCP server? (www.reddit.com) I've seen so many rag this, memory that projects. What projects are people actually using day to day for agentic workloads.
Benchmarked Gemma 4 E2B: The 2B model beat every larger sibling on multi-turn (70%) (aiexplr.com via reddit) Tested Gemma 4 E2B across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.
↯ Security↯ Gemma 4↯ Function Callingfunction-callingprompt-injectionrag+2
Show HN: YourMemory, agentic memory is a pruning problem, not a hoarding problem (yourmemoryai.vercel.app via hn) This is a project that I have been building for a while now, YourMemory is a solution to agentic memory which focuses on pruning of noise rather than hoarding of data. In the current state of agentic memory most of the context is stored in…
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA (www.reddit.com) I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc (https://github.com/mayubo2333/MMLongBench-Doc). There were 171 q…
PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together (www.reddit.com) Previously a model could only be present in a single group. Now you can create whatever groups you want: one for big models that should run on their own, a group for STT + bigger model, a group for RAG usages, etc.
governance wall in agentic workflows. why are we stuck past rag? (www.reddit.com) keep seeing the same pattern across agent projects. we're good at building agents that find information, but the moment we ask them to actually do something (update a crm, trigger a payment, touch a production database), things grind to a…
Very detailed guide to building AI Agents? (www.reddit.com) (Rant ;)) Make your benchmarks realistic (www.reddit.com) Everybody here is posting their optimizations for running different models - thats good but make these benchmark realistic as speed is not one factor to run llm effectively. Context size is key - with agentic/coding/rag work you need to ha…
Curated a list of 550+ free or cheap AI tools for vibe coding (LLM APIs, IDEs, local models, RAG, agents) (www.reddit.com) Been vibe coding a lot recently and kept running into the same problem finding actually usable tools without paying for 10 different subscriptions or donating my bank balance to Claude. So I put together a curated list focused on free or l…
Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models? (www.reddit.com) I don't see any threads on this model. Is it because it's dense and/or without-reasoning?
how do you guys handle the conversation with skeptical clients when selling agents? (www.reddit.com) struggling with a bit of a reality check lately and wanted to see if anyone else is running into this. been pitching agentic workflows for a while, and I've realized that leading with the tech - the orchestration the RAG, the "intelligence…
Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers (www.reddit.com) I kept seeing inference-speed claims for these models and wanting an apples-to-apples comparison on the hardware I actually have. So I built a harness and a public page that dumps every run as YAML.
Show HN: XTrace – Encrypted vector DB (search embeddings without exposing them) (github.com via hn) Hey everyone! This is XTrace.
Evaluated a RAG chatbot and the most expensive model was the worst performer. Notes on what actually moved the needle. (www.reddit.com) We had a customer support RAG bot. Standard setup: ChromaDB, system prompt, an LLM doing generation.
Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup? (www.reddit.com) So I've been going down a rabbit hole lately and I can't find many people actually talking about this specific use case. everyone here runs local LLMs for coding, chat, maybe some creative writing.
OpenAI has announced they will be winding down fine tuning. (www.reddit.com) Got an email today about the announcement. > OpenAI is winding down the fine-tuning API and platform.
Choosing a Mac Mini for local LLMs — what would YOU actually buy? (www.reddit.com) RAG on Snapdragon X2 Laptop, 200K documents. (www.reddit.com) Qualcomm recently released the new 𝐒𝐧𝐚𝐩𝐝𝐫𝐚𝐠𝐨𝐧 𝐗2 𝐥𝐚𝐩𝐭𝐨𝐩 𝐜𝐡𝐢𝐩𝐬𝐞𝐭. I immediately ordered one: ASUS Zenbook A16 16" 3K OLED Touchscreen Laptop — Snapdragon X2 Elite Extreme (2026) A few things I really like about this machine: 𝐄𝐱𝐭𝐫𝐞𝐦𝐞𝐥𝐲 𝐥𝐢𝐠𝐡𝐭.
How are you actually using AI agents in real workflows right now? (www.reddit.com) I’m building some infrastructure around AI agents and I’m trying to understand how people are actually using them in real workflows, not demos. Specifically curious about: - What your agent actually does day-to-day (not hypotheticals) - Wh…
Show HN: GlycemicGPT – Open-source AI-powered diabetes management (github.com via hn) I'm a Type 1 diabetic and software engineer. Last year I went months between endocrinologists with no clinician reviewing my data.
Sanity check: using git to make LLM-assisted work accumulate over time (www.reddit.com) Show HN: AI support chatbot with RAG and citations – one back end file, no infra (github.com via hn) Upload markdown docs, get a support chatbot that answers with citations. The entire backend is one JS file — storage, search, and conversation history are handled by the runtime.
Struggling to balance high-volume orchestration (www.reddit.com) Working on a multi-agent system for a large outbound pipeline. We're running 100+ LinkedIn and email accounts, and simple linear automation (step A then step B) breaks down fast because real conversations don't move in a straight line.
Training SID-1 to beat GPT-5 at search with 1k+ QPS RL (turbopuffer.com via hn) SID-1 is an agentic search model that is 24x faster than GPT-5.1-high, 374x cheaper than Sonnet 4.5, and achieves 1.9x higher recall than traditional RAG pipelines. Here's how we trained it using large-scale RL on turbopuffer.
How are you maintaining your AI apps post-launch? Model bugs vs engineering bugs, and what's your debugging stack? (www.reddit.com) I've been going down a rabbit hole tinkering about what actually happens after you ship an LLM-powered app, and I'd love to hear how others here handle it… A few things I keep getting stuck on: Continuous optimization. Once your app is in…
What RAG (www.reddit.com) What RAG system are you using and why? What do you think advantages and disadvantages are on current RAG systems?
Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs (www.reddit.com) RAG/Retrieval as a solution (www.reddit.com) hi folks, I am new to the community and I have gone through the rules and I hope I am not breaking any of them with this post and will try to maintain 1/10 ratio. For building RAG, there are many tools out there each solving a piece of t…
Any reason to run dense over MOE for RAGs? (www.reddit.com) I tend to use Claude for a lot of research and I also increasingly worry about things like misinformation or things in the model I can't audit. So, I'm building my own all in one RAG with big datasets like all of Wiki, research papers, all…
Every week this we see some version of "how do I evaluate my LLM app?" and the answer almost always stops at RAGAS or DeepEval. Here is the part of the evaluation stack most tutorials skip in 2026. (www.reddit.com) The same question lands on this sub a few times a week, and the standard answers (RAGAS, DeepEval) are correct but stop one layer short of what you actually need once your app leaves a notebook. Wanted to lay out the full picture for anyon…
LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications (github.com via reddit) I've been pretty unsatisfied with web search options for local LLM/RAG systems. Most setups either rely on paid APIs like Brave, or meta search scrapers like SearXNG.
Persistent memory system for LLMs that actually learns mid-conversation (www.reddit.com) Every LLM conversation starts from zero. RAG helps, but it can't learn from what's happening right now.
I have built something using claude what I was doing on excel from last 13 years (www.reddit.com) I am doing financial modeling for the startups and feasibility reports for the new companies for more than a decade now, I started playing with Lovable 6 months ago, then somebody introduced me to the VSCode with claude, it’s like a superp…
Are we overengineering RAG when the real problem is structure? (www.reddit.com) Lately I’ve been working on a few enterprise AI use cases, and one thing keeps coming up. We spend a lot of time trying to improve retrieval.
Turning RAG pipelines into enterprise-grade Data Subscriptions (halcyon.io via hn) Back in September, we at Halcyon shared our plans to build five data subscriptions in the coming months. If you are reading here, you’ve probably been along the journey with us: from gas power plants, to large load tariffs, to utility rate…
Total idiot needs some build advice (www.reddit.com) Looking for some advice here because I made a hasty purchase. "Cut your losses and move on" is totally a reasonable answer, but I figured I'd look for some additional help. So, I just started working on a local RAG pipeline with about 15,0…
RAG demo for New Zealand residential tenancy law (tenancy.localrun.ai via hn) This tool searches real NZ Tenancy Tribunal decisions published by the Ministry of Justice. Decision links point to NZLII.
We reduced RAG retrieval cost 10× with a hippocampus-inspired memory substrate (www.bricbybric.ae via hn) We Built a Memory Engine. The Brain Told Us How.
Show HN: Harbor v0.4.19 – harbor launch –back end vLLM –web codex (github.com via hn) https://github.com/user-attachments/assets/e4897391-c5a8-4391-93c3-9f8b76155f11 Setup your local LLM stack effortlessly. Starts fully configured Open WebUI and Ollama harbor up Now, Open WebUI can do Web RAG and TTS/STT harbor up searxng s…
Your RAG is hallucinating because of garbage retrieval — here's the 3-line fix (with real scores) (www.reddit.com) My RAG agent hallucinated. Not because the LLM was bad — because the retrieval was feeding it noise.
Show HN: I built a RAG and knowledge graph agent that runs locally (news.ycombinator.com) Claw-Coder is an AI agent that runs locally on your laptop and has access to powerful tools instead of configuring claude or codex to use a local model just use claw-coder. Why was claw-coder created?
I Kept a Diary for Seven Years. An LLM Finally Read It. (www.reddit.com) I've kept a personal diary since 2019. Last week I fed 200+ entries to an LLM and asked it how I've changed over 7 years.
how would you set up a local llm server for a business of 7 people? (www.reddit.com) Okay so i've been stalking this sub for some time and i run the occasional small 2-8b model on my laptop (not the best) for fun but say my role at a company is to set up a local LLM since we obviously don't want confidential data going to…
Show HN: An agent that tunes its own cache (news.ycombinator.com) The weekend of last week I built chat.betterdb.com as a RAG over Valkey/Redis/Dragonfly docs. The goal was to eat our own dogfood and test publicly our caching libraries.
I made tiny AST tool for agent code exploration - No RAG, no index, no cache (www.reddit.com) A small tool I made for myself (ast-outline), sharing in case it's useful... still experimenting with it.
An Open Benchmark for Testing RAG on Realistic Company-Internal Data (www.reddit.com) We built a corpus of 500,000 documents simulating a real company, and then let RAG systems compete to find out which one is the best. Introducing EnterpriseRAG-Bench, a benchmark for testing how well RAG systems work on messy, enterprise-s…
I'm looking for an AI Automation Engineer role or gig (news.ycombinator.com) Hi all, I'm an AI automation engineer who builds systems that replace manual work, scale outreach, and turn workflows into revenue. I have sent out working systems for managing leads to CRM, finding real estate deals, sorting emails with A…
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com) Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
↯ Fine Tuning↯ Opus 4.7↯ Function Callingfunction-callingfine-tuningrag+4
How to build an agent that is both neuro-symbolic and probabilistic (www.reddit.com) Most agent architectures treat memory like a rigid database, but that leads to the "stochastic drift" everyone complains about. My partner is a neuroscientist and we've spent the last year modeling an agent’s memory on biological systems r…
Is evaluating RAG retrieval using UI only useless? (www.reddit.com) Suppose that for now you only had access to the frontedn of a RAG system and you don't know how does the backend works, but you need to improve confidence of retrieved results. How do you design this process to be able to improve it?
building a Multi-Agent AI App for automated Bill of Quantities. Need architecture/framework any advice! (www.reddit.com) GPU strategy for local LLM + mixed workloads (70-person company) — NVIDIA vs AMD? (www.reddit.com) Hey all, we’re a mid-sized company (~70 people) and currently planning to bring a lot of our workloads on-prem instead of relying on cloud APIs. The goal for the moment is to run small to mid-sized models in the range of 30B like Qwen3.6 o…
Plugging Claude into Obsidian for a RAG like system. (www.reddit.com) Hey so I am just going to make a post to see what almighty reddit has to say but I am trying to get claude to connect to an Obsidian vault so it can help me reference lecture notes, textbook theory, past claude convos, and projects and sof…
Show HN: SynapseKit – Async-native Python framework for LLM pipelines and agents (github.com via hn) []() Documentation · Quickstart · API Reference · Changelog · Discord · Report a Bug Build production LLM apps with 2 dependencies. Async-native RAG, Agents, and Graph workflows — no magic, no SaaS, no bloat.
The 'Dark Code' Problem and Milla Jovovich's New Open Source Agent Memory System (www.reddit.com) Recently Milla Jovovich open sourced an LLM memory management system based on the concept of memory palaces (essentially placing memories into rooms that can be retrieved later). Memory management in LLMs is a big problem.
I open sourced a local-first LLM wiki for research and durable memory (www.reddit.com) I’ve been building a small tool called oamc around a workflow I wanted for personal research and long-running project memory. The basic idea is: instead of repeatedly querying raw notes/documents, sources get ingested into a maintained mar…
Composition Hallucinations: Not all RAG hallucinations are retrieval failures (zenodo.org via hn) Composition Hallucination in Retrieval-Augmented Generation: A Failure Mode and Benchmark Protocol Description Retrieval-Augmented Generation (RAG) is commonly motivated by the idea that language models answer more faithfully when relevant…
I compared 8 open-source AI agent frameworks so you don't have to — here's the full breakdown (www.reddit.com) We did a deep-dive comparison of the 8 major open-source AI agent frameworks as of mid-2026: 🔹 LangGraph — Best for complex state machines & DAG workflows 🔹 CrewAI — Best for multi-agent role-playing teams 🔹 AutoGen — Now in maintenance mo…
How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui (www.reddit.com) As some other fellow lllmers I've discovered few days ago that the amazing llama.cpp project has just added native tools functionalities into the server. After having enabled the relative options into llama-server and played a bit with the…
A primer on how large language model works (mayijie.substack.com via hn) How Large Language Models Work A primer on LLMs: from tokens, embeddings, and Transformers to training, RAG, tool calling, multimodality, and end-to-end product flows. A large language model is not a human mind simulated in software.
Building Agentic GraphRAG Systems: From knowledge graphs and ontologies to a unified memory as an MCP server for your AI agent. (www.reddit.com) I gave this talk twice in one month: at O’Reilly’s Context Engineering Event and at Abi Aryan’s Maven course on LLM inference at scale. After being blasted with questions, I realized something: GraphRAG isn’t a retrieval algorithm, it’s a…
How are you protecting your AI agents' memory from poisoning attacks? (www.reddit.com) As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…
MSA 100M tokens (www.reddit.com) https://arxiv.org/abs/2603.23516 https://github.com/EverMind-AI/MSA If verified, rag is no more needed.
RAG retrieves the refutation and still gets it wrong (reyes.id.au via hn) Anchor catching the failure mode where RAG retrieves the refutation and still gets it wrong Ask vanilla RAG over Duval, Goeckner, Klivans, and Martin's 2015 paper "A non-partitionable Cohen-Macaulay simplicial complex" this question: What…
is multi-agent architecture worth the 15x token cost? (www.reddit.com) moving my current research workflow from a single generalist agent to a multi-agent setup (MAS), and the projected token usage is terrifying. some benchmarks suggest it can be up to 15x more expensive than a standard chat exchange.
Do you guys use AI / Agents for direct profit or do you apply it to be more effective - Could use some guidance and motivation I'm 20 (www.reddit.com) I'm kinda tired of kinda doing rocket Science to have a local agent. Trying to Figure out why its out putting garbage , Then Getting it's output to to stream through my UX layer Properly , Getting it to call tools properly.
Ask HN: Anyone using AI agents for active learning sprints? Here's my setup (news.ycombinator.com) Hi HN, I'm a big fan of AI's ability to provide personalized tutoring. So, lately, I have been using my Antigravity IDE (you can use any agentic harness) for personal learning.
What tools are you using to give your LLM a persistent second brain / long-term memory? (www.reddit.com) I've been going down a rabbit hole trying to solve LLM memory. the problem where every session starts blank and your agent has no idea what it learned last week.
Open-source CLI that turns a folder of docs into a queryable wiki — no vector DB, no chunking (www.reddit.com) Been looking for a self-hostable way to maintain a personal knowledge base from research docs without the complexity of setting up a vector database, writing chunking logic, and babysitting embeddings. Ran into OpenKB this week and it's cl…
Why many RAG projects are still hallucinating (www.reddit.com) I’ve been auditing quite a few RAG codebases lately, and it’s surprising how often the hallucinations creep in even when the setup looks decent on paper. A lot of the trouble starts with chunking.
Mastermind – agentic SDLC workflow for VS Code (news.ycombinator.com) Prototype of an agentic SDLC workflow running inside VS Code + Copilot. Simple loop: task → reasoning → audit → memory → RAG refresh.
Which local models are actually good at staying in character? Notes from shipping Qwen3.5 4B + 9B as game NPCs (www.reddit.com) I'm building a small text-based game where the gameplay loop is "talk an NPC into revealing a secret." It's basically a 20+ turn roleplay stress test: the model needs to stay in character, remember what the player said earlier, and refuse…
How are you handling citation/traceability in AI-driven research workflows? (www.reddit.com) been spending ages lately trying to tighten up citation + traceability in RAG-based research workflows, and I’m starting to feel like “retrieval” and “verifiability” are still pretty loosely coupled in most stacks.Typical setup (vector sea…
Project Knowledge indexing never completes on large .md files — permanent spinner, RAG as silent fallback (Max plan, reproducible) (www.reddit.com) I've been using Claude Max for a few months now, and Projects have been central to my workflow. I use two Markdown files in a long-term project that I update regularly — they're essentially living documents that grow over time as I add not…
Building a Production-Grade RAG Chatbot for a Complex Banking Site, Tech Stack Advice Needed? (www.reddit.com) Hey everyone, I’m currently working on turning a fairly large and structured financial website into an AI-powered knowledge assistant (RAG-based). The site itself isn’t trivial, it has multiple product categories (cards, loans, accounts),…
Show HN: 5-translation RAG matrix fixing LLM religious hallucinations (github.com via hn) Show HN: How context engineering works, a runnable reference (github.com via hn) I've been presenting at local meetups about Context Engineering, RAG, Skills, etc.. I even have a vbrownbag coming up on LinkedIn about this topic so I figured I would make a basic example that uses bedrock so I can use it in my talks or v…
I Tried the LLM Wiki and RAG on Todays News from BBC, CNN, Euronews (99helpers.com via hn) Israel-Lebanon Ceasefire Agreement DEEP DIVEIn-depth analysis of the 10-day, US-brokered ceasefire agreement established between Israel and Lebanon. A pivotal 10-day ceasefire agreement between Israel and Lebanon officially went into effec…
Building a fully local Android manual assistant (LiteRT-LM + RAG) what architecture would you use? (www.reddit.com) Processing img 8ofni1q6dpvg1... Hello everyone, I’m building an offline RAG system for my company, we are trying to run an app that retrieves infromation from two manuals in an android tablet with the idea of an AI to provide precise answe…
Zuver – Build your enterprise Agents with just 10MB RAM (news.ycombinator.com) I built Zuver, the generic Agentic AI framework for scalable, reliable, even on-edge AI applications and Agents. It's completely written in Go, which lowers the RAM usage to around 6MB, compared to other Agent framework that's usually arou…
Show HN: Terraform RAG - index modules, distill conventions, compose via MCP (terraform-rag.io via hn) AI-powered knowledge base for your Terraform modules. Index, search, compose, and audit - all from one place.
ContextWall – Context firewall for AI agents and RAG pipelines (contextwall.io via hn) Your AI agent reads untrusted content. Every web result, document, and API response your agent retrieves goes straight into the model's context window - unscreened.
Show HN: ContextBridge – Local-first AI reading sidebar using Ollama (chromewebstore.google.com via hn) Overview Store, search, and chat with web page content locally. AI chat (BYOK), full-text search, markdown export, and optional RAG endpoint.
Stop AI agents from being weaponized through their own memory (OWASP) (www.helpnetsecurity.com via hn) OWASP Agent Memory Guard: Stop AI agents from being weaponized through their own memory AI agents keep memory across sessions. Conversation history, vector stores, scratchpads, and RAG indexes persist between runs, and anything written int…
I built an enforcement layer for AI coding agents using a local knowledge graph and hybrid RAG (www.reddit.com) I know this sub is focused on local models but the architecture behind this applies to any LLM-powered coding agent, not just Claude Code. The problem: when you give a coding agent a large set of rules and standards, two things break.
The Self-Healing Vector Database (www.reddit.com) A pattern I keep seeing in agentic RAG systems: The agent is smarter than the retrieval layer. It can notice that context is stale.
Show HN: Search Router – retrieval-ready web search for AI agents (github.com via hn) Search Router is a web search API built for AI agents and RAG systems. We built it internally at first, when working on AI tools.
The only way to avoid prompt injection is to never give AI agents API keys, credentials, etc. (www.reddit.com) The whole point of AI Agents is that they can *do* things. For this, they use API keys, GitHub tokens, database passwords, OAuth tokens, etc.
Tool-schema compression enables agentic RAG under constrained context budgets (arxiv.org via hn) Agentic RAG systems that equip language models with dozens to hundreds of tool definitions face a critical resource conflict: tool schemas consume the same context window needed for retrieval-augmented generation. We present the first syst…
Are local LLM users testing prompt injection before connecting models to tools? (www.reddit.com) I wanna know how people here are handling security once local models move beyond chat.....Running a model locally feels safer because the data does not leave your machine or your infra. That is a real advantage.....But once the local model…
Maybe the problem with non-coding agents is that they have no repo (www.reddit.com) I’ve been trying to understand why coding agents seem to work better than most non-coding agents. Maybe the thing coding agents have that most other agents don’t is the repo itself.
numind/NuExtract3 · Hugging Face (huggingface.co via reddit) NuExtract3 is a unified 4B vision-language reasoning model for document understanding. It combines strong structured information extraction with high-quality image-to-Markdown conversion, making it suitable for extraction pipelines, OCR, a…
Is there any reason for an uncensored model if you have no interest in roleplaying? (www.reddit.com) My rag I've been building is much in response to having a LLM that I feel more confident in knowing where the knowledge base is coming from especially after the Open AI deal with the Pentagon. So, when I saw "uncensored" heretic models, I…
Agent builders: are GPT/Claude/Gemini API costs killing your margins? (www.reddit.com) Hey everyone, For people building agents with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude MCP/SDK, Google ADK, or LlamaIndex — how are you managing LLM API costs? Agent workflows can get expensive fast because of: tool calls retr…
PDF and non-text local file reading with AnythingLLM? (www.reddit.com) So far, AnythingLLM works well for me when i copy files over to docker folder (so originals can't be erased/modified), and i have LLM do a text search. RAG I tested but with number of files and specificity, just searching for file names an…
Embedding models are coordinate systems. What silently breaks in production RAG (internals.laxmena.com via hn) Your embedding model doesn’t understand your data INTERNALS.md #3 · It never did. Here’s what it actually does, and why that matters for every RAG system you’ll ever build.
Context is shared. Commitment is not. (www.reddit.com) Everyone is talking about context management. RAG pipelines, memory systems, knowledge graphs, long-context windows.
Show HN: Nano-RAG – Agentic multi-hog retrieval without graph database (news.ycombinator.com) https://nanorag.nb1t.sh/ Important: Please choose correct namespace from top-right dropdown. Available docs/namespaces: Cloudflare, Nextjs, and Dodo-payments (default).
RAG vs. Fine-Tuning – The Question Every AI Builder Gets Wrong (thingswithai.org via hn) RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong AI models don't know your private data.
Are AI agents creating a new runtime supply-chain attack surface? (www.reddit.com) I’ve been thinking about AI agent security less as a prompt-injection-only problem and more as a runtime supply-chain problem. In many deployed agents, the model is no longer just generating text.
Agent memory is not just RAG over user facts (www.reddit.com) I keep seeing agent memory implemented as: Extract facts/preferences from conversation Store them Retrieve top-k before each response Inject them into the prompt This works for demos, but it breaks in production because memory becomes poli…
What do you charge for production-ready invoice/document automation? Sanity check on a €20k quote (www.reddit.com) I am currently looking to get into automation for German Mittelstand and I am now talking to an SME, which got an offer from a consulting firm for document processing automations and trying to figure out if the pricing is normal or inflate…
RAG Eval Comparing Vertex/Bedrock/Azure/OpenAI (github.com via hn) RetrievalCI Stage: bench-v0 early preview. The methodology, scorecard format, and 9 system adapters are stable.
I just have a question about Langchain and Langgraph (www.reddit.com) I want to know that learning these fundamentals is enough to land job or is there something else that i have to learn along with these? Right now i am learning about genAI through campusX and making rag projects.
Token, Harness, OpenClaw, RAG, MCP, Agent – What's the Difference? (medium.com via hn) 11 min read Apr 23, 2026 You know these terms alone. Together?
Argus – RAG based vulnerability scanner (github.com via hn) argus A RAG-based (Retrieval-Augmented Generation) vulnerability scanner for Go, Python, Rust, npm/Node.js, Maven/Java, NuGet/.NET, and Ruby projects — powered by local Ollama models or any OpenAI-compatible API. No cloud lock-in.
Some notes and lessons on Agents, RAG and memory (www.reddit.com) I put together some notes on building agents. I have built agents at scale for a while now and for a few clients, so I thought i would start putting all the knowledge into lessons that might help other people as well.
On "harness engineering": Are people actually building things or just giving impressive labels to "tweaking?" (www.reddit.com) I see a lot of posts and videos talking about harness engineering, or it could be context engineering, RAG, etc. The thing is, most of them talk about the concepts.
Open Sourcing Our Platform - GuideAnts Notebooks (www.reddit.com) This is yet another agent harness and UI and I hope you will have a look and consider contributing. Elumenotion/GuideAnts: GuideAnts Notebooks.
We built an agentic AI for support triage. 47% deflection in 90 days. Full retro. (www.reddit.com) Setup: mid-size SaaS, ~3,000 tickets/month, 6 agents drowning. 70% of volume was tier-1 (passwords, billing, where's-my-feature).
Claude architecture mock test.. (www.reddit.com) Built a new update for Claude Playground 🚀 Added Mock Tests for learners preparing for the Claude Architecture exam — users can now validate their understanding and test their learning directly on the platform. The goal of Claude Playgroun…
Project knowledge file indexing reliability seems to be getting worse? (should I just use cowork instead?) (www.reddit.com) I haven't used Cowork yet - Would it solve my troubles with Project Knowledge files not indexing consistently? I see Projects can now be imported to Cowork, then I'd have my knowledge files hosted on my hard drive?
Ask HN: Are you optimizing content for AI Search (GEO) vs. traditional (news.ycombinator.com) With the rise of SearchGPT, Perplexity, and Gemini, the goal of content is shifting from "ranking on page 1" to "being cited in the answer block." I’ve been working on a tool (https://aibg-intelliagent.com/) that uses a private RAG (Retrie…
RAG vs. Fine-Tuning: Which AI Strategy Saves Your Team Time and Budget (lightrains.com via hn) Two weeks before a Fortune 500 product launch, we told a client to scrap their fine-tuned model and rebuild with RAG instead. They lost eight weeks and $180K.
Egg meet face. (www.reddit.com) https://preview.redd.it/drtw1mjwf7zg1.png?width=997&format=png&auto=webp&s=90b45173c1caba12a10bd4ff4a0a717563be9512 https://preview.redd.it/kk1ayljwf7zg1.png?width=997&format=png&auto=webp&s=f0b210cef867d817891635138f9a531b7e2e2fcc https:/…
NodeMind – binary document index, 48× smaller than float32 RAG, no GPU required (github.com via hn) NodeMind — Binary Document Intelligence 48× smaller online · 32× smaller offline · up to 100× on images. 75× faster search.
How are you feeding documentation into agents/RAG without HTML noise? (www.reddit.com) I’m testing a workflow where docs sites get converted into: concise llms.txt index full Markdown bundle cleaned page chunks manifest JSON For people building agents or local RAG systems: do you prefer one giant Markdown file, per-page Mark…
Built a free migration wizard for moving ChatGPT history into Claude Projects — learned a few things about how Projects actually work (www.reddit.com) Been using Claude for a few months and hit the same wall everyone hits: years of context stuck in ChatGPT with no real path to bring it over. Claude's built-in memory import is surface-level — name, preferences, tone.
I built an AI that tries to answer life’s hardest questions using the Bhagavad Gita. (www.reddit.com) I built an AI that tries to answer life’s hardest questions using the Bhagavad Gita. Over the last few weeks, I’ve been building GitaGPT Mentor It’s not just another chatbot.
W2A: an open protocol for agent sensors — giving local agents real-time perception (www.reddit.com) Sharing a project that just went public: World2Agent (W2A) — an open protocol for the perception side of the loop. Entirely self-hostable, no SaaS, no telemetry, TS SDK, Apache 2.0.
How should AI agents handle continuity across long-running conversations? (www.reddit.com) Hi everyone, I’ve been working on a continuity layer for OpenClaw agents, and I’d like to get feedback from people building or running AI agents. The problem I’m trying to solve is that many agents can respond well within a single turn, bu…
Poisoning RAG document corpora: 32 vectors tested, 19 succeeded (corrupted.io via hn) RAG Poisoning: When Your “Safe” AI Eats Bad Documents So you built a RAG pipeline. Congrats.
FerresDB is now open-source – A high-performance vector database (github.com via hn) FerresDB Core High-performance vector search engine written in Rust, designed for semantic search, RAG (Retrieval-Augmented Generation) and recommendation systems. Overview FerresDB Core is a Rust vector search engine for semantic search,…
Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab (www.agentmcp.studio via hn) I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML fi…
RAG isn’t for conversation transcripts (www.reddit.com) Documents are authored, bounded, and self-contained. They carry their own semantic links and can be represented as a wiki or cleanly split into overlapping chunks.
How Claude Projects actually loads files into context? Want to optimize token burn; can't get a straight answer (www.reddit.com) I've built a fairly involved system inside a Claude Project: project instructions plus 10 project files that function as a routing system. Trigger words in the instructions point Claude to specific files (instructions, templates, reference…
Feedback on VectorLess RAG? (www.reddit.com) From an year working in space of developing based pipeline and applications. Have worked enough building data on vector db + chunking + embedding etc., now there is an new trend of using vectorless RAG.
How do you decide on chunking strategy and top-k in Agentic RAG? Looking for practical advice (www.reddit.com) Hey, I'm building an Agentic RAG pipeline and struggling with two decisions: Chunking strategy — fixed-size, semantic, or hierarchical? In an agentic setting where the agent can re-query iteratively, does it make more sense to use smaller…
Looking for FREE resources to master RAG + LLM Agents + MCP (and build real projects for freelancing/jobs) (www.reddit.com) RAG as Similarity Engine (necromant2005.github.io via hn) Is anyone else using Cursor to build local VRAM/RAG architectures instead of just wrapper apps? Here is my 8-month deep dive. (www.reddit.com) I'm completely lost in the Agentic Maze. What level to learn. how to organize stydu (www.reddit.com) Stop using naive RAG – adding relationships to AI context (news.ycombinator.com) I’ve been working a lot with RAG systems recently, and kept running into the same issue: they retrieve relevant chunks, but lose the relationships between them. This becomes a problem pretty quickly when dealing with real systems (docs, AP…
Show HN: AI agents should browse your site, not call your API (www.rtrvr.ai via hn) We compared four architectures for putting AI agents on websites — RAG bots, API-tool agents(WebMCP), code-writing sandboxes (Cloudflare Agent Lee), and DOM-native execution. Three of them force you to maintain a parallel engineering surfa…
TF-IDF over code signatures hits 80% hit@5 retrieval — no vectors, no embeddings. Tested on 18 repos. (www.reddit.com) Been experimenting with context compression for local models. Wanted to test how far pure heuristic retrieval can go before you actually need vectors.
I built an MCP server that turns Claude into an emergency medicine assistant — what I learned building AI for high-stakes domains (www.reddit.com) If you work in healthcare or just want to see how Claude handles high-stakes clinical reasoning — I built an MCP server for this and wanted to share what made it harder than a typical AI project. EMSy is built on top of Claude and connects…
Open source research agent with RAG, streaming, and web search - one file backend (www.reddit.com) Built two open source agents: 1. Research agent - searches the web, streams answers with sources (like Perplexity) 2.
It's tax time... agent-built RAG app end-to-end with Claude Code + an SDK skill (www.reddit.com) It's tax time, so I whipped up a tax doc assistant with our new Ragie skill. Concrete example of agent-assisted development that goes further than toy demos.
Cursor AI not using sub-agents (www.reddit.com) Hi everyone, I work for a German agency building a RAG chatbot for a law firm. I use Opus 4.6 but it eats up tokens.
Show HN: NRC nuclear licensing RAG pipeline and regulatory embeddings dataset (huggingface.co via hn) I've been building an AI system to automate parts of the NRC Combined Operational License process: gap analysis against the Standard Review Plan, FSAR strength scoring, and RAI prediction using vector similarity to historical NRC requests.…
Memelang: Terse SQL for LLM Generation (memelang.net via hn) Memelang is an AI-optimized query language that significantly reduces token count and model size for LLM RAG. The code below is designed to be copy-and-pasted into your LLM.
Prompt Injection in RAG Agentic Systems (ulad.net via hn) Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.
Show HN: Incremental RAG ingestion, only changed chunks get re-embedded (github.com via hn) chunks-sync Incremental synchronization for RAG pipelines. Most RAG ingestion pipelines re-embed every document whenever a file changes, even if only one paragraph was edited.
MemGraphRAG: Memory-Based Multi-Agent System for Graph RAG (arxiv.org via hn) Retrieval-Augmented Generation (RAG) has become an essential method for mitigating hallucinations in Large Language Models (LLMs) by leveraging external knowledge. Although effective for simple queries, traditional RAG struggles with large…
Show HN: Ext-Infer – Native LLM Inference and Embeddings for PHP (infer.displace.tech via hn) Introduction ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs LLM inference inside the PHP process via llama.cpp. PHP-native semantic search, RAG pipelines, and CLI / worker inference run without shelling out to Python or…
Tool to convert technical PDFs into RAG-ready chunks and Obsidian vaults (pdf-knowledge-extractor.onrender.com via hn) Sign In / Create Account Enter your API key to sign in New user? Create a free account with 5 extractions Account created!
RAG Without Persona Modeling Fails Patient Clinical Relevance (www.riddhimohan.com via hn) HPPIE fuses persona modeling into the RAG pipeline to deliver patient-specific health content. 2nd of 300+ at a Global AI Hackathon.
Show HN: Digger Solo – Local AI File Explorer (solo.digger.lol via hn) After a lot of work I present Digger Solo 0.5.0 - the AI file explorer that respects your privacy (everything runs locally). Demo video: https://vimeo.com/1198414414 New features: - LLM Chat with RAG (bring your own OpenAI compatible API k…
Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG (www.infoq.com via hn) In this article, author Aaditya Chauhan discusses the limitations of RAG pipelines based purely on vector search and how an internal omni-search application using Reciprocal Rank Fusion (RRF) that combines BM25 and vector results, can enha…
Show HN: Extract (YC P25) – Fast, accurate document parsing (extract.page via hn) Hey HN, we’re Soami, David, and Achyut, co-founders of Extract. Extract parses documents into structured data (text, tables, and figures).
How We Index Images for RAG (www.kapa.ai via hn) How we index images for RAG Reading the screenshots, diagrams and tables in technical documentation for LLMs by Matteo Bortoletto Kapa builds AI assistants that answer questions from technical documentation. The knowledge bases we process…
Open-source NLI ensemble matches Sonnet 4.6 on RAGTruth at 1/250x the cost (github.com via hn) verifiable-rag Document-grounded Q&A with sentence-level citations, NLI verification, and calibrated refusal. Status: pre-alpha · v0.5 launch sprint · interfaces are still subject to change 📚 Full documentation at firish.github.io/rag-rack…
Running local RAG AI on MacBook neos (securethink.co.uk via hn) AI-powered document analysis that runs 100% locally on your Mac. Analyse contracts, engineering specs, and sensitive data without the cloud.
Authorization Before Retrieval: Making RAG Safe by Construction (www.windley.com via hn) Retrieval-augmented generation makes language models far more useful by grounding them in real data, But it also raises a hard question: who is allowed to see what? This post shows how authorization can be enforced before retrieval, ensuri…
VDF AI – Multi-agent AI orchestration with dynamic model routing (vdf.ai via hn) VDF.AI is the on-premise AI agent platform for enterprises that need governed multi-agent orchestration, private RAG, LLM routing, and full data sovereignty — without the cost or lock-in of cloud AI.
How to Stress-Test LLM Judges Fairly (www.alphaxiv.org via hn) We're hiring Paper Blog Audio 4 / - Hide Tools Ctrl + / Open Tools A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test Assistant My Notes Comments Similar
knowledge graph for maintaining git worktrees and shared findings across projects (www.reddit.com) sometimes when i scroll social media i see stuff about knowledge graphs. it crossed my mind that I do something similar.
Turn any GitHub repository into an interactive code graph in seconds and use it as an MCP with your AI Assistants (www.reddit.com) Change https://github.com/owner/repo → https://cgc.codes/owner/repo A standard GitHub URL can be instantly transformed into a CodeGraphContext (CGC) graph URL, unlocking architecture visualization, code navigation, dependency exploration,…
Gnani AI - AI Prompt Engineer role (www.reddit.com) Anyone here working at Gnani AI or knows someone there? I got an offer for the AI Prompt Engineer role and wanted to know how the work culture is.
Is a 128 GB MacBook Pro M5 Max actually too slow for large-context local LLM coding workflows? (www.reddit.com) People are warning me about the prompt-processing speed of a MacBook Pro M5 Max with 128 GB RAM. My main concern is prompt ingestion / prefill latency and large-context handling — not raw token generation speed (which I think is OK).
Tlamatini – Local-first AI dev assistant with 68 agents and hybrid RAG (github.com via hn) Tlamatini A local-first AI developer assistant that goes beyond chat. Run it on your machine with Ollama.
Why Does Everyone Think AI Agents Are Easy? (www.reddit.com) Lately it feels like every problem gets the same answer: “Just build an AI agent.” I had lunch recently with people outside tech, and someone mentioned spending hours replying to customer chats at work. Immediately another person said: “Wh…
Is grep all you need? Lexical VS Sematic Search for Agents (www.llamaindex.ai via hn) Lexical search with grep is fast and precise, but it breaks down at enterprise scale. Learn when to use grep, semantic search, or a hybrid RAG approach to build AI agents that can search any corpus, in any format, at any size.
AI for internal IT support/password resets in mid-size & enterprise companies- is anyone actually seeing good adoption? (www.reddit.com) Anyone here from a mid-size or enterprise company using AI for internal IT support workflows like password resets, account unlocks, MFA resets, software access requests, etc.? We’re exploring AI-driven employee support internally and I’m c…
LMIM OS – an offline AI ecosystem. Voice, RAG, WhatsApp. ++ One file. 0 setup (lmim.tech via hn) 19+ tools — no cloud, no API key, no subscription. All in one AppImage / Installer.
Who Wants to Be Hired? (May 2026) – AI Engineer (Python, RAG, Agentic Workflows) (news.ycombinator.com) About me: I am an AI Product Engineer specializing in building autonomous agentic workflows. Recently, I built 'Jarvis', a multimodal autonomous agent featuring near-zero latency inference using Groq SDK and complex RAG pipelines.
Databricks project ideas as a Data Engineer looking to transition roles (www.reddit.com) Hey, I'm a data engineer looking to transition into AI engineering. I'm looking to learn and build a resume with some projects.
Why codex /goal fails on complex workflows: compaction amnesia and context rot (news.ycombinator.com) Hi HN, When Openai released `/goal` earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex…
Astrum Verum – A Vector Symbolic cognitive memory that beats RAG (github.com via hn) Astrum Verum Composition-episodic cognitive memory for AI agents — and an honest record of how it got here. Astrum Verum is a research project containing two distinct phases of memory architecture development.
Every RAG-based localization pipeline has the same blind spot (lingo.dev via hn) If a localization pipeline uses retrieval augmented generation to inject glossary terms into the model's context window, it has a retrieval recall problem that has never been measured. The pattern is universal: embed the input text, cosine…
RAG for developer docs so local llm can code using latest library? (www.reddit.com) I was wondering if it would make local llm better at coding if it has access to the latest documentation available through a RAG. I'm specifically interested in python.
Trying to work around AI and its constraint at my workplace (www.reddit.com) I would rate my AI skills between beginner and intermediate. I know how to use tools like ChatGPT and GitHub Copilot to build a chatbot with a system prompt.
Built a production RAG chatbot with custom MCP servers as the action layer, sharing what I learned (www.reddit.com) I've been building agentic tooling at work and wanted to share one pattern that worked. Instead of a chatbot that only retrieves and answers, I wired custom MCP servers in as the action layer, so staff trigger live workflows (create record…
Ask HN: Why agentic development stops from 2023 (news.ycombinator.com) I leave this field in 2023 return back in 2026 and I see that only progressive development in coding agents, but some production solutions it’s just tools rag and maybe mcp that in general the same as tool. I thought it will be super leap…
The shared recipe behind search: Images, Shazam and RAG (medium.com via hn) medium.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
Enterprise AI why soo cumbersome (www.reddit.com) Just started in a new bigger company. Suppose to accelerate the adoption of AI.
"Most RAG benchmarks lie about real-world corpora." Test data from 3 production websites. (www.reddit.com) Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: Workspace Sources Chunks HIGH MEDIUM LOW REJECTED Intercom 188 941 96 200 541 104 HubSpot 251 1705 40 508 1153 4 KPMG 53 209 3 14 127 65 (…
ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster (www.reddit.com) I built ztok, a tokenizer library focused on being fast and format-agnostic for local pipelines. - Loads what you already have — .tiktoken, HF tokenizer.json, SentencePiece .model, TokenMonster, Mistral Tekken.
Gemini filesearch scalability (www.reddit.com) I'm about to introduce gemini filesearch to my company to handle all the RAG related operations but not just internally, I'm fixing the projects VS stores logic to be able to scale this up to thousands of small clients. Has anyone used gem…
Most agent RAG problems I see are retrieval problems, not model problems (www.reddit.com) I've spent the past year building a site-search product and watched maybe 50 teams plug their docs into a vector DB, expect magic, and end up debugging why the LLM is lying. Its almost never the LLM.
Open catalog of agent patterns + the frameworks that implement them (www.reddit.com) I have been building an open catalog of agent patterns and the frameworks that implement them. It is a pattern language in the Christopher Alexander sense, mapped onto the current agent landscape.
My agent kept forgetting who 'Karpathy' was between sessions. Here's the architecture that fixed it (www.reddit.com) I run a second brain on Obsidian, Readwise, NotebookLM, and Claude Code. For each topic, I build a scoped wiki structured as the LLM Knowledge Base Andrej Karpathy proposed.
AI agents are making tokenization platforms far more usable than I expected (www.reddit.com) Been working on AI-assisted workflows for tokenization platforms recently, and I’m honestly surprised by how useful agents are becoming in complex financial processes. Some areas where they’ve helped a lot: onboarding automation document u…
Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph) (www.reddit.com) Hey everyone, I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Co…
Built a Fetch API that returns page labels, not just markdown (www.reddit.com) I'm working on a Fetch API for RAG, agents, and web ingestion workflows. Think Firecrawl/Jina Reader-style URL-to-markdown or clean-text API, but with one extra signal layer: page labels for content category and page structure.
We engineered RAG to be 50% faster (elevenlabs.io via hn) How we engineered RAG to be 50% faster - Written by - Michal Korbela - Published - Last updated ListenListen to this article RAG improves accuracy for AI agents by grounding LLM responses in large knowledge bases. Rather than sending the e…
Booking.com and Weaviate (news.ycombinator.com) Vector search looks easy, until you hit production scale. I'm super excited to share a new episode of the Weaviate Podcast with Başak from @bookingcom on production-scale vector search, RAG, and agentic AI with @weaviate_io!
What Matters in Production RAG (arpitbhayani.me via hn) Most of us build RAG the same way: follow a tutorial that embeds a handful of PDFs, stores the vectors in a local Chroma instance, and chains everything together with LangChain (if that’s still a thing). The demo works.
The LLM Fine-Tuning Guide (www.promptinjection.net via hn) The Ultimate LLM Fine-Tuning Guide From dataset to GGUF - every parameter explained, every step runnable Fine-tuning is a direct intervention into how a language model behaves. Not prompting, not system instructions, not RAG - actual weigh…
Why our AI agent needed a causal graph, not just a RAG database (openyf.dev via hn) The transition from Phase 01 to Phase 02 was not planned. It was not the result of reading a paper on causal reasoning or deciding theoretically that knowledge graphs were superior to flat lists.
Project Prism |Fullstack Engineer – Abu Dhabi (Onsite) – Full-Time – Presight.ai (news.ycombinator.com) Presight.ai is a publicly listed company with various projects in the field of big data analysis and ML models application. Our solutions work domestically and internationally.
Built an agentic RAG over my Obsidian vault so Claude could read engineering books I never have time for. Then I built the eval harness to check Claude wasn't lying to me. (www.reddit.com) For context, I posted on Medium a while back about burning through Claude Code's weekly limit in 3 days. The token bleed problem from that post is what kicked off this project.
We compiled 42 of the Generative & Agentic AI interview questions (and how to actually answer them). (www.reddit.com) Hey Everyone, The AI engineering job market has shifted massively in the last 6 months. Interviewers are no longer just asking "how does a transformer work?" or "how do you write a good prompt?" They want to know if you can architect produ…
Are we all quietly rebuilding memory systems because current AI memory doesn’t actually work long-term? (www.reddit.com) The more I work with long-running agents, the more it feels like most “AI memory” today is just retrieval with nicer branding. Everything works in demos: vector DBs RAG summaries context packing knowledge graphs But after enough real usage…
Show HN: RAG-LCC – config-driven RAG framework for fast experimentation (github.com via hn) 🧪 RAG‑LCC — Experimental RAG Under Constraints RAG‑LCC is an experimental Retrieval‑Augmented Generation (RAG) lab focused on understanding and controlling retrieval and context assembly under real‑world constraints: limited context window…
What is the most unexpected thing you have gotten a local model to do? (www.reddit.com) Most local LLM use cases I see are chat, coding, and RAG. But with vision models getting better and faster on consumer hardware, I feel like there is a lot of untapped territory.
Agents Can Reason. They Still Can't Search (dipkumar.dev via hn) Agents have a search problem across the whole stack: web search, RAG, tool discovery, skills/workflow loading, and even context compaction.
Tried 12+ agentic AI workflow builders this year — these 5 actually work in production (www.reddit.com) Most “AI agent” tools in 2026 still feel like glorified chatbot wrappers. I spent the last few months testing different agentic AI workflow builders for real-world automation use cases (multi-agent workflows, approvals, integrations, long-…
Most RAG apps in production are confidently wrong and nobody talks about this enough (www.reddit.com) Been working with a few teams integrating RAG into internal tools, support bots, document Q&A, contract search, and I keep running into the same thing nobody warns you about when you're following tutorials. The basic retrieve-then-generate…
There's a meaningful difference between a knowledge base your LLM searches and one it can navigate. Has anyone shipped something in the second category? (www.reddit.com) RAG gives you search over a corpus. Useful.
Some Business Ideas (news.ycombinator.com) Infrastructure side: 1. A service connector that can be attached to any AI Assistant to authenticate users and connect services easily, similar to MCP but more universal.
What Is a Business Agent? (jitera.com via hn) Chatbots respond, copilots assist, and RAG systems retrieve. Business agents take actions, plan workflows, call external systems, and know when to stop and ask.
Local-first LLM context dedup: 22-71% chunk overlap measured across 22M passages (2 arXiv papers). MCP server, MIT, 250KB binary, zero telemetry. (www.reddit.com) I'm the author of this thing, disclosure up front. Been hanging around this sub lately on cache invalidation, MoE memory tradeoffs, long-session token bloat.
22-71% of your AI coding input tokens are duplicates, we measured it across 22M passages (2 arXiv papers). Just shipped MCP support for Cursor (www.reddit.com) Disclosure first: I'm the author. MIT, runs locally, zero telemetry.
Microsoft patched 137 bugs, but the Azure AI Foundry one is what caught my eye (www.reddit.com) Microsoft just patched 137 vulnerabilities across Azure, Windows, Dynamics 365, Copilot, Office, and other products. Most of it looks like the usual Patch Tuesday flood, but one detail stood out: Azure AI Foundry is listed among the high-s…
Arkon: turning Claude from a personal chatbot into a managed organizational resource (www.reddit.com) Sharing a project I've been building. Not asking for anything in particular - just thought the problem and approach might be interesting to some folks here.
New guy with an RPG agent Project (www.reddit.com) Hi' I'm a long time tabletop game master and a rather neophyte programmer(college diploma in programming for video games, no real work experience yet). I have done a 4 hours AWS workshop on building RAG agents during my intership with a st…
How I buld agents (www.reddit.com) Everyday I see tons of AI generated posts about tricks to build AI agents. Here is one written by a human with experience and typos :) Step 1: Never directly compete with a human Is there a specific job title for sth?
A 3.5 MB C++ engine for deterministic RAG deduplication hitting 30 GB/s (github.com via hn) Merlin Community Local-first dedup for LLM context. Lite engine, MIT integrations, papers on arXiv.
Stop struggling with Agentic AI - my repo just hit 540+ stars and 60+ forks!! (www.reddit.com) Quick update — my AI Agent Frameworks repo just passed 540+ stars and 60+ forks on GitHub!! When I first put it together, my goal was simple: make experimenting with Agentic AI more practical and approachable.
A Bette RAG Alternative (www.codynamicslab.com via hn) $ docker run --gpus all -p 8091:8091 codynamics/latch:latest [latch] runtime starting on http://0.0.0.0:8091 [latch] status=loading profile=cdlac_latch_qwen14b_locked_20260317 [latch] warmup complete status=ready $ curl -s http://127.0.0.1…
We added an enforcement layer to our AI agents in production — here's what we learned about the failure modes nobody talks about (www.reddit.com) After shipping AI agents into real production environments, the failures that actually kept us up at night weren't hallucinations or bad outputs — they were control failures. Three things that surprised us: 1.
FlowFlow, voice notes with on-device RAG in Rust for iOS (github.com via hn) FlowFlow Mobile voice notes app with AI chat — 100% Rust, Dioxus iOS, local-first (SQLite + LanceDB). Built with Dioxus 0.7 for iOS.
I just launched my first open-source project and I want to learn how to become a better developer/maintainer. A remote vibe coding tool. (www.reddit.com) Hey everyone, I’ve been a developer for a while, but I’ve always been a "lurker" when it comes to open source. Recently, I finally pushed my first project to GitHub: Legax.
Integrating standard operation procedures with agentic AI workflow (www.reddit.com) Hello guys, me and my team have been building an agentic workflow to answer customer questions (rn in langgraph). The use case goal is to answer ALL customer support questions.
Here is the current "Free-Tier AI Stack" for 2026 (www.reddit.com) 1. The Frontier Giants • Gemini: Access 1.5B tokens/day on Gemini 1.5 Flash/Pro.
Meet Tiro! Agentic assisted memory retrieval and session state memory module. (www.reddit.com) A year ago, when I first got into LLMs, I started by using them to play D&D. ChatGPT 4o was surprisingly good at narration, improvisation, and keeping the game moving.
My agent returns HTTP 200 but gives factually wrong answers. How are you catching this? (www.reddit.com) Working on a support agent and hit a gap I hadn't thought about. Agent completes successfully.
Show HN: Nexa-gauge – Cache/cost-aware graph-based eval for LLM and RAG (github.com via hn) nexa-gauge - Graph-Based Evaluation for LLM and RAG Systems A cache-aware evaluation engine for measuring LLM and RAG output quality with repeatable metrics, cost estimates, and structured reports. Read the Documentation · Quickstart · CLI…
RAG chatbot for internal ops docs. Anyone built something like this? (www.reddit.com) I run ops for a custom home builder. We have SOPs, HR policies, project checklists, and process docs...all living in Dropbox & I want to give my team a simple way to ask questions & get accurate answers without hunting through folders.
Show HN: Build a custom AI in under 60 seconds (demo video) (www.youtube.com via hn) we've added onboarding that lets our users build custom AI for their website ready to deploy under 60s total. the platform consists end to end AI engineering, prompts, version control, evaluations, test cases, logs, AI Actions (custom tool…
Show HN: I built a playground of interative A/B testing for RAG (rag-dr.hanhanwu.com via hn) To iteratively improve RAG performance, current evaluation solutions still take lots of manually work or lots of coding. And it requires close collaboration between AI engineers and domain experts (who may not know how to code).
I built a WP plugin to solve the "AI Search" problem (YouTube-to-Blog and RAG) (www.indiehackers.com via hn) Hey IH, Like many of you, I’ve been watching traditional SEO traffic drop as Perplexity, SearchGPT, and Gemini Overviews take over. In 2026, if your content isn't being cited, it’s basically invisible.
PageIndex: Vectorless, Reasoning-Based RAG (github.com via hn) PageIndex: Vectorless, Reasoning-based RAG Reasoning-based RAG ◦ No Vector DB ◦ No Chunking ◦ Human-like Retrieval 🌐 Homepage • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact 📢 Updates 🔥 Agentic Vectorless RAG — A simple…
The RAG era is ending – a compilation-stage knowledge layer is what comes next (venturebeat.com via hn) The RAG era is ending for agentic AI — a new compilation-stage knowledge layer is what comes next | VentureBeat Orchestration Infrastructure Data Security More Newsletters Featured The RAG era is ending for agentic AI — a new compilation-s…
Show HN: Memex, Claude memory via local RAG (MCP, offline embeddings) (memex-cli.vercel.app via hn) Local-first second brain with semantic search. Gives Claude persistent memory across conversations — all data stays on your machine.
Agentic RAG Explained in 3 Levels of Difficulty (machinelearningmastery.com via hn) In this article, you will learn what agentic RAG is, how it differs from traditional RAG, and when to use it. Topics we will cover include: The key limitations of traditional RAG pipelines and what agents add to address them.
Kvaser - Moving beyond simple agents: Building a Local-First AI Orchestrator with Qwen 3.6, Kiwix, and Wolfram (www.reddit.com) For the past two weeks, I’ve been spending 4–5 hours a day building a custom MCP (Model Context Protocol) orchestration server. What started as a simple experiment with Qwen 3.6 35B has evolved into a full-scale "Man-in-the-Middle" proxy t…
How good is Gemini Embedding 001 for scientific retrieval? (www.reddit.com) How good is Gemini Embedding 001 for scientific retrieval (RAG application)? How does it compare against Text Embedding 3 Large?
Honestly, chunking is where most RAG systems quietly go wrong (www.reddit.com) Honestly, chunking is where a lot of RAG systems start lying to you while still looking fine in the demo. It works when the question is narrow and the document is basically prose, but once users ask messy real questions, the retrieval laye…
Hello Guys. Quick Question On Research. (www.reddit.com) Looking for the people actually pushing on multi-agent architectures right now, not the N8N crowd. The progression I've been following: single chat → Claude Code → multi-file projects with context engineering → multi-agent systems → orches…
Why we ended up with 4 agents and 3 protocols for agentic commerce on Shopware (www.reddit.com) Most agentic-commerce demos I see online are a single agent plus RAG over a product catalog. That shape works for a 200-SKU demo.
LangGraph and Cosmos DB: one back end for agents, memory, and RAG (devblogs.microsoft.com via hn) Build AI Agents and RAG Applications with the New LangChain + LangGraph Connector for Azure Cosmos DB Building AI agents and RAG applications today means stitching together half a dozen services, a vector database, a chat history store, a…
EGA: Runtime Enforcement for LLM Outputs (v1.0.0) (www.reddit.com) I built EGA - a runtime enforcement layer for LLM outputs. The problem: eval tools score after the fact that something went wrong.
Why is RAG evaluation so hard in the real world? (www.reddit.com) Evaluating RAG feels easy in theory, but production is a different challenge. We’ve been looking into why RAG benchmarking is such a moving target.
I almost shipped OpenAI embeddings until an MTEB rank #130 model beat them by 11% (www.reddit.com) I just interviewed Michael Maximilien, former CTO at IBM and Chairperson of NodeJS Foundation, who spent a year shipping production RAG to multiple customers. His lesson was uncomfortable.
Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG (arxiv.org via hn) Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds o…
Local query autocomplete with "classical" ML, no LLM needed (www.reddit.com) Hey guys! I know this is not fully LLM related (its still local though :D), mods feel free to delete this if you think its off topic, but I just wanted to share something I experimented with, local autocomplete without the use of LLMs or f…
Should I continue to create my RAG project? (www.reddit.com) To preface this, I work in the oil field, I like to homelab as a hobby. But there is a lot of standards and policies that aren't always easy to find and look up.
anyone else trying to pipe their own data into claude via mcp? (www.reddit.com) I'm trying to build a reliable local RAG setup for claude and it is just exhausting. I want claude to have access to my github repos and past project docs without me copy-pasting everything into the window every morning.
I finally sat down and did the math on my Cloud LLM bills… and I’m moving almost everything to a 4090. (www.reddit.com) I used to be all-in on cloud APIs. For any side project, I’d just grab an OpenAI or Anthropic key and not think twice.
How are teams bridging the gap between company knowledge and AI agents? (news.ycombinator.com) AI agents are capable enough to automate real work now. But they keep failing because they don't know how a specific company actually operates.
Show HN: MAItion – Open-source RAG with pluggable connectors and chat UI (github.com via hn) Hey HN, We wanted to share a new tool we’ve been working on. Even when documentation is well-structured, sometimes it’s hard to find what you need.
Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com) Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
↯ Fine Tuning↯ Function Callingfunction-callingfine-tuningrag+3
Question: What are some useful content, web-scraping, web search tools, ingestion libraries, or MCPs for Karpathy's LLM Wiki? (www.reddit.com) Hey all, so I am currently exploring and playing around with Karpathy's LLM Wiki using Claude Code with Ollama and other routed models. I want to create some agents and provide them with tools/plugins, libraries, MCPs, or harnesses to assi…
Building a Full-Stack Agentic AI Platform (RAG + Orchestration + Governance) — feedback? (www.reddit.com) Hey folks 👋 I’ve been working on an AI agent platform called Noevex, focused on real production use—not just demos. In practice, AI systems struggle with: multi-step orchestration connecting multiple data sources controlling agent actions…
Why I’m still using RAG even with 2M context windows… (www.reddit.com) Look, when those 2 million-token context windows dropped earlier this year, I thought RAG was dead. I was like, “Why am I still chunking documents and building vector databases when I can just throw 50 PDFs into one prompt and be done?” So…
Technical Overview of an AI RAG System with React, Python, Laravel, Redis (gist.io via hn) LongTerMemory: Technical Overview LongTerMemory is an AI-powered SaaS platform for exam preparation and long-term knowledge retention. It combines Retrieval-Augmented Generation (RAG) with spaced repetition scheduling to help users study s…
I ran retrieval-auditor against LangChain's RAG quickstart, 5/6 flagged (github.com via hn) The corpus is Lilian Weng's "LLM Powered Autonomous Agents" — the blog post that the LangChain RAG tutorial uses as its canonical demo. The retriever is the LangChain default (cosine similarity over all-MiniLM-L6-v2 embeddings, top-5).
After weeks of RAG setups, the bottleneck is the data pipeline, not the model (www.reddit.com) I spent weeks tuning retrieval models, then realized the real problem was getting sources into clean, structured, interlinked form. Scrape a webpage and you get a mess of HTML.
Ask HN: How do you solve aggregation when agentic RAG breaks down? (news.ycombinator.com) I keep hitting the same failure mode with agentic RAG over collections of similar PDFs, like monthly electricity and gas bills from the same utility provider. It works well for retrieval: “Find my gas bill from January.” Though even there…
Show HN: Local RAG Pipeline with Weaviate and Ollama (www.storyblok.com via hn) i’ve been experimenting with building a fully local rag pipeline: weaviate for vectors + hybrid search, node.js scripts, qwen 3.5 on ollama what i found is that most of the challenges live in retrieval and chunking, not the LLM, and a good…
Built a GraphRAG voice agent over JRCALC 2022 clinical guidelines using Gemini Live, part of a hackathon first-aid system for Meta Ray-Ban glasses (github.com via reddit) The voice guidance layer in our hackathon project uses a Gemini agent backed by a GraphRAG index over the JRCALC 2022 guidelines (the UK ambulance service clinical reference). When the system detects stroke signs or abnormal heart rate it…
Build your own voice assistant and run it locally – Whisper, Ollama, Bark (2024) (medium.com via hn) 9 min read Mar 31, 2024 -- After my latest post about how to build your own RAG and run it locally. Today, we’re taking it a step further by not only implementing the conversational abilities of large language models but also adding listen…
Show HN: AI memory with biological decay (52% recall) (github.com via hn) Most RAG setups fail because they treat memory like a static filing cabinet. When every transient bug fix or abandoned rule is stored forever, the context window eventually chokes on noise, spiking token costs and degrading the agent's rea…
Built a Legal RAG Chatbot for Indian lawyers covering BNS, BNSS, BSA and DPDP Act 2023 — Custom PageIndex + BERT + GPT-4o [Live Demo] (www.reddit.com) I ran a business for 12+ years. Traveling constantly.
Where is the boundary between a multi-agent and a monolithic AI agent structure? (www.reddit.com) Enterprise systems often avoid "monolithic" AI to prevent context rot and hallucinations. The standard fix is task-decoupling: splitting logic between specialized agents or deterministic code.
LLM CTF challenges. Can you crack all 13? (wraith.sh via reddit) Wraith Academy is a free hands-on AI pentest curriculum — CTF challenges against live LLM agents covering prompt injection, tool abuse, data exfiltration, RAG poisoning, and more. Earn your WCAP certification.
RAG pipelines, leaking PII into vector databases and nobody's talking about it (comply-tech.co.uk via hn) Your RAG Pipeline Is Leaking Customer Data Into Vector Embeddings If you're building a RAG (Retrieval Augmented Generation) system on internal documents such as customer support history, knowledge base articles, or internal comms, there's…
I almost built RAG for my notes, then realized I didn't have a retrieval problem at all (www.reddit.com) My notes live in Obsidian. My reading and highlights live in Readwise.
5060ti + 32gb DDR4 (www.reddit.com) What models/quants have impressed you lately for 5060ti ? The use case is professional writing, RAG and long document summarization, not coding, so good instruction following and precision are a plus.
RAG in Go: A Vulnerability Research Tool (www.ardanlabs.com via hn) Introduction In the previous post, you saw how you can use tools to add information to an LLM query. In this post, we’ll see another method of adding information to an LLM called RAG, or Retrieval-Augmented Generation.
When the pronoun "they" breaks your RAG pipeline (old.reddit.com via hn) could not extract summary
Edster – An open-source local AI agent with swarm mode and a web UI (github.com via hn) 👾 Nedster CLI Coding Agent An unstoppable, fully local, open-source coding agent that runs on your consumer GPU. Tags: ollama coding-agent local-ai cli rag chromadb python qwen Are you trying to use local LLMs to autonomously write code, r…
Show HN: DataFrey – MCP server for Snowflake with text-to-SQL agent (docs.datafrey.ai via hn) I’m a data scientist and I find it hard to use Claude Code for SQL - it doesn’t have DB context. so I made yet another database MCP server!
Combine persistant global Memory- and Task- management into one uniform system (www.reddit.com) Is there any way to implement multimodal RAG using some open-source multimodal large models? (www.reddit.com) Show HN: Infrawise Azure Cloud Optimization (infrawiseai.com via hn) Steno – Compressed memory with RAG for AI agents (github.com via hn) Steno Compressed memory notation with RAG retrieval for AI agents. Steno solves the AI memory problem: agents accumulate knowledge across sessions, but loading everything into context every time is expensive, noisy, and causes drift.
Show HN: Corvi Careers – privacy first job search with resume matching (corvi.careers via hn) It lets you search 1M+ jobs across multiple regions, refine by keyword/category/location, upload a plain text resume for better matching, filter by target companies. Searches, keywords extracted from resume and bookmarks are saved locally,…
Best way to prepare for AI Engineer interviews? (www.reddit.com) I’m currently preparing for AI-focused roles and would love to get perspectives from people already working in the industry. For context — I have ~5 years of experience as a Full Stack Engineer with a strong focus on AI systems.
Sweet RAG Evil Model (www.reddit.com) Scenario A: Given: A search query to reduce context is provided When: Results are pushed to the system as completion. Then: a question will respond with accurte results Scenario B: Given: Scenario A data is in a slots KV Cache When: new se…
I made an 80B local model ship a 295-test RAG codebas (github.com via hn) rag-workshop A local-first RAG system built autonomously by a multi-agent framework. This repository is the reference implementation produced by the C.E.H.
Has anyone used Claude Opus 4.7 API on Qubrid or another platform? Use case? (platform.qubrid.com via hn) Advanced GPU infrastructure, collaborative AI Agents, and intelligent RAG systems. Build, deploy, and scale AI solutions with comprehensive tools.
Shopping assistant chatbot (www.reddit.com) I need to create an ecommerce shopping assitant chatbot. Customers would reach out via chat, and the agent/chatbot would help check inventory and make product recommendations based on what customers share.
the shortest path to "Claude that actually knows what I did today" is one npx command (www.reddit.com) every other day someone here posts about karpathy's llm wiki idea, or "how do I give my agent context about me," or "I want a personal knowledge base my AI can use." and then the comments are always the same - build RAG, write a pipeline,…
Good multi-agent harness with db-based long term context? (www.reddit.com) I'm looking for suggestions for an agent harness that uses a database (SQLlite, RAG, what ever) for long-term context. I plan to use my RTX3080 & 3090 for local AI, though I expect to use APIs for some tasks.
Show HN: GraphifyAI – Turn Any CSV/Excel into a Neo4j or LangChain Graph (graphify.midlantics.com via hn) Converting spreadsheets to graph databases (Neo4j, Neptune, etc.) usually means manually defining nodes, relationships, and writing Cypher from scratch. It's tedious.
How to diagnose RAG failures from traces (www.siquick.com via hn) How to diagnose RAG failures from traces If a RAG system fails in production, the first question we should be asking is "what broke in this trace?". Until you can answer that, most scorers or dashboards aren't going to help you.
CDRAG: RAG with LLM-guided document retrieval — outperforms standard cosine retrieval on legal QA (www.reddit.com) Hi all, I developed an addition on a CRAG (Clustered RAG) framework that uses LLM-guided cluster-aware retrieval. Standard RAG retrieves the top-K most similar documents from the entire corpus using cosine similarity.
Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com) I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…
Reports of RAG's death have been greatly exaggerated (atomicapp.ai via hn) Redirecting from /blog/llm-wiki-needs-a-substrate/ to /blog/rip-rag
Two-Stage Semantic Chunking for RAG in Python (alessandrofuda.github.io via hn) Fixed-size chunking splits text at arbitrary token boundaries, cutting mid-sentence and blending unrelated topics into the same chunk. Here’s how to build a two-stage pipeline with LlamaIndex , structural splitting first, semantic coherenc…
Whats the SOTA embedding model for arabic Language (www.reddit.com) Hello! I’m working on RAG system on arabic documents any idea on the best embedding model out there?
Anyone here tried the "compile instead of RAG" approach? (www.reddit.com) Been seeing this idea where instead of doing the usual RAG loop, you compile all your sources into a markdown wiki first, then query that directly. The interesting part is that saved answers become part of the wiki too.
Mitre ATLAS technique detection for LLM security in Rust (crates.io via hn) atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…
Beginner in Langraph with no dev experience. How to build projects from scratch (www.reddit.com) Recently got recruited tin PwC post masters in data science. Interview was in traditional ml but now I must work in AI projects.
The reason your AI agent keeps failing has nothing to do with the model (www.reddit.com via reddit) I've spent the last 8 months building AI agents. Research agents, competitive intel agents, RAG pipelines, you name it.
How are you handling aggregation/counting questions in doc-aware agents? RAG keeps failing me here (www.reddit.com via reddit) Something I keep hitting building agents that work over documents, curious how others solve it. RAG is the default doc tool we give agents, and it's great for "find/explain the passage about X" — the answer lives in one place, retrieval fi…
HOW much llm context does an agents need (www.reddit.com via reddit) Does it depennds on the llm or the agent (RAG) capabilities , like i want to do an experiment with a very small language model with ok rag like few functionalities i hope someone is has this idea . the idea is to run those agents in mobile…
Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents (arxiv.org) Retrieval-augmented generation (RAG) enables agents to access external knowledge at inference time, but it primarily retrieves fragmented declarative evidence, leaving agents to repeatedly infer task procedures from passages, manuals, exam…
SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance (arxiv.org) Retrieval-Augmented Generation (RAG) injects LLM queries with relevant documents to improve response quality. This injection increases prompt length and slows time to first token (TTFT).
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems (arxiv.org) Harmonia: End-to-End RAG Serving Optimization (arxiv.org) Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era (arxiv.org) DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking (arxiv.org) From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG (arxiv.org) Evaluating RAG Reliability under Clean, Misleading, and Mixed Retrieval (arxiv.org) Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries (arxiv.org) The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection (arxiv.org) Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG (arxiv.org) Using Claude as a deterministic metric engine via Postgres queues. Anyone doing this? (www.reddit.com via reddit) I've been working on turning unstructured field data into calibrated metrics. Instead of normal RAG, I built a system where AI agents act as a metric engine.
How do you pull an entry level job/ freelance? (www.reddit.com via reddit) Hey everyone, I’m a self-taught Python developer transitioning into AI Integration and Database Automation. For those who started out self-taught in automation/AI integration: - What was your fastest route to finding your first freelance o…
Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong? (www.reddit.com via reddit) Hello everyone I am totally new to LocalLLMs and only used chatGPT/Claude/NotebookLM before. So bear with me 😃 I'm an attorney and would like to analyze and summarize case files locally for privacy/confidentiality reasons.
Need a Production-Level RAG AI Agent Tutorial (www.reddit.com via reddit) Can anyone suggest a Production-Level RAG AI Agent tutorial (YouTube video, documentation, course, GitHub repo, etc.)? My goal is to build a project that is actually worth adding to the Projects section of my resume for AI Engineer roles.
Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection (arxiv.org) Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationsh…
MHA-RAG: Improving Efficiency, Accuracy, and Consistency by Encoding Exemplars as Soft Prompts (arxiv.org) Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking (arxiv.org) TA-RAG: Tone-Aware Retrieval-Augmented Generation for Peer-Support Health Communication (arxiv.org) HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG (arxiv.org) SEEK: Steering LLM Reasoning for RAG via Internal Reasoning Sketches (arxiv.org) RAG for you see it live open source files any kind (www.reddit.com via reddit) This is for visualizing file extraction through RAG (or file ingestion into any structured data set). I've been really into different shapes of data (like graph db, etc).
RAG visualizer open source (www.reddit.com via reddit) This is for visualizing file extraction through RAG (or file ingestion into any structured data set). I've been really into different shapes of data (like graph db, etc).
I built an AI support agent where the main metric is unsafe auto-action rate, not just accuracy (www.reddit.com via reddit) I built a production-shaped AI customer support agent for telecom, and the biggest lesson was that classifier accuracy is not enough. I recently finished RelayOps v1.2, a telecom/subscription customer-support agent built as a vertical slic…
How are you actually using AI on large construction projects? (www.reddit.com via reddit) I've spent several years in project management for large oil / gas / refinery projects, working on the contractor side. With AI dominating the conversation these days, I've been using platforms like Claude Code in my off-hours, and it's st…
Alternatives to ChromaDB for easy RAG search (www.reddit.com via reddit) I'm disappointed that ChromaDB's local, free "single node" version is still getting second-class, hand-me-down features while the "distributed" version (a SaaS offering, unsurprisingly) gets built in hybrid search, BM25, etc. I tried to gi…
Building a Claude-certified developer network: looking for builders to join (free certification path) (www.reddit.com via reddit) [Update] Wow, 32 sign-ups already, thank you all! Still plenty of room (we're aiming min.
Hey I want to be able to build and optimize agent? Any recommandation about how to learn? (www.reddit.com via reddit) I want to learn how to build an agent and I can then try to optimize or be creative about it. This include something like (RAG, Embedding, Skills, MCP, subagent isolation, context window, memory, Harness etc.) I want to learn but resources…
I built a RAG system for the first time. Here's what nobody told me would be the hard part (www.reddit.com via reddit) Had been reading about RAG for months before I actually built one. Every explanation made it sound straightforward.
Guidance please (www.reddit.com via reddit) I need help . pls help !
Answer Presence Drives RAG Rewriting Gains (arxiv.org) Retrieval-augmented QA pipelines often route retrieved passages through an LLM \emph{rewriter} before a smaller reader, lifting F1 by tens of points on multi-hop benchmarks; this gain is typically credited to improved evidence quality. We…
FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG (arxiv.org) When retrieved evidence contradicts parametric memory, language models frequently ignore context and default to memorized priors -- a failure that undermines the core purpose of retrieval augmentation. Contrastive decoding amplifies the co…
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving (arxiv.org) Retrieval-augmented generation (RAG) improves large language model (LLM) answer quality by grounding generation in external evidence, but processing retrieved contexts makes the prefill stage a dominant serving cost. RAG cache fusion reduc…
Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents (arxiv.org) LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by semantic…
Agent-Orchestrated Adaptive RAG: A Comparative Study on Structured and Multi-Hop Retrieval (arxiv.org) Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding their responses in external knowledge, but conventional pipelines rely on static, single-step retrieval that limits performance on complex queries. Thi…
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface (arxiv.org) Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses. While RAG has shown st…
A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (arxiv.org) Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlen…
HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation (arxiv.org) Embedding geometry plays a fundamental role in retrieval quality, yet dense retrievers for retrieval-augmented generation (RAG) remain largely confined to Euclidean space. However, natural language exhibits hierarchical structure from broa…
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation (arxiv.org) Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augmented Generation (RAG). However, its br…
IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval (arxiv.org) Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate time with…
"We didn't know what YCombinator was 5 months ago. Last week Garry Tan asked us to take down what we built." (www.reddit.com) 5 months ago, i didn't know what YCombinator was. Last month, the president of YC noticed what we built.
I made a small tool to inspect retrieval results before feeding them into RAG (www.reddit.com) I’ve been messing around with live web retrieval for RAG, and the part that kept annoying me wasn’t the search call itself. It was figuring out whether the returned results were actually usable as evidence.
2 years of work, 8 iterations and we have waited to introduce our Product Alexandria so long. Its a cursor or claude code for your daily office life! (www.reddit.com) We spent 2 years testing whether “vibe engineering” could become real Hey everyone, We’re a small team of 3 brothers + our father as senior advisor. Background: 2 mechanical engineers 2 construction engineers father with 30+ years in const…
Cost of Using LLMs in Agentic AI and RAG workflows (www.reddit.com) Hey Everyone ML engineer and Researcher here I’ve been researching production issues in Agentic AI + RAG systems and one pattern keeps showing up repeatedly: Context inefficiency. Not just retrieval quality — but the actual economics and s…
Building Expertise in Claude - Seeking Quality Learning Resources (www.reddit.com) Hi everyone, I'm on a mission to become a serious expert in Claude and AI, and I'm building a structured learning path. I want to create content that's actually valuable - with real practical applications, not surface-level tutorials.
I need HELP with a document classification task (www.reddit.com) Hey everyone, my company's tasked me with building a document classification system, insurance documents specifically. someone dumps a batch of documents, and the system needs to classify and label each one correctly.
A Small Site That Explains LLM/Agents Without the Hype (100% free, no sign up required) (www.reddit.com) I am a PhD student at UofToronto doing agent research. Seen a lot of hype around this topic which get people (especially non-tech) hella confused.
Are smaller local models improving faster where it actually matters? (www.reddit.com) From the inference side, it’s been interesting seeing how often smaller open models end up staying in active use simply because they’re fast enough to constantly interact with. A year ago a lot of these models felt more like demos or side…
"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support" (huggingface.co) "OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support" - user: oncoagent-research tags: - oncology - multi-agent - LangGraph - RAG - QLoRA - AMD - open-source - clinical-ai - healthcare Onc…
I got tired of the API bills for 100k+ context windows, so I built a persistent O(1) semantic memory state engine to compress history (www.reddit.com) Hey everyone, The entire industry right now is cheering for massive 1M+ context windows, but I think it's fundamentally the wrong approach. "Just add more RAM" is a trap.
You don't need a GPU server to run Claude agents (www.reddit.com) I’ve been seeing a lot of newcomers asking about hardware specs lately, and there’s this weirdly common myth that you need a heavy server or a GPU instance to run Cla͏ude-based agents. You really don’t.
My Mac Mini kernel-panicked twice. Turned out MCP servers were eating 1.5 GB at idle, leaving no headroom for anything else. So I built a process supervisor (www.reddit.com) tl;dr (Claude caveman edition): MCP servers sit around doing nothing, eat 1.5 GB. Machine angry.
Sentient OS: I spent a year hacking MLX and doing surgery on Qwen to process 3,000 screenshots overnight on a 6 year old iPhone. Every optimization explained :D (www.reddit.com) hey localllama :) I got a multimodal vision LLM to process 3,000 screenshots overnight on a 6 year old iPhone -- entirely on-device. below is every hack, surgery, and optimization i built over the past year to make this possible!
Macbook M3 MAX 64 vs M5 PRO 48, or wait for spark/studio (www.reddit.com) I’m choosing between two refurbished MacBooks, both around $3,100. Option 1: 14” M3 Max, 16-core CPU / 40-core GPU, 64GB RAM, 1TB SSD.
I stopped writing 500-word guardrail prompts. This 8-line template works better. (www.reddit.com) I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." I…
I want to create and maintain a set of benchmarks for local LLMs. Would anyone pay/donate for this? (www.reddit.com) Please help me build some clarity. I want to participate in local LLMs ecosystem more.
The Claude Code Pro removal is getting framed as 'just go local' but for production systems it's messier (www.reddit.com) Yesterday's Claude Code Pro removal thread hit 350+ comments in a few hours, and the dominant take was basically "switch to Kimi K2.6, go local, done." I upvoted that thread and tbh im mostly there — but im building voice agents and RAG pi…
Best open-source tools for prompt injection defense in 2026 (www.reddit.com) Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…
Need a MVP for a RAG, rent Hardware for short term (www.reddit.com) I am working in an MVP for a small RAG, just to show what is possible. I currently do not have appropriate hardware, so I need to rent something for a short period.
Most AI agents have amnesia. I built one with a wiki-based memory that compounds over time. (www.reddit.com) Jarvis — Your Personal AI Companion (www.reddit.com) What I learned improving LoCoMo retrieval from 89.6% → 93.9% (www.reddit.com) Spent the last few weeks measuring how far you can push conversational memory retrieval without any LLM calls. Sharing what worked on LoCoMo (Snap Research's 1982-question benchmark over 10 long conversations) in case it's useful to others…
tested async performance across LangChain, LlamaIndex, and Haystack under concurrent load. The results were worse than I expected and here's what we found. (www.reddit.com) Been running LLM pipelines in production for a while. Kept noticing throughput numbers that didn't make sense for "async" code.
Why are so many Creating "local Chat" inference models? (www.reddit.com) I'm a novice but so confused by the tech driving the tech. Whats the use cases that are being driven by so many spending on 20K local modelling hardware, that cant compete with the pending dramatic decrease in cost per token let alone the…
Running on cpu :( (www.reddit.com) I am in the midst of a POC project at work and am I have is 4 AMD Epyc cores and those are essentially virtualized. Does any one have any tricks?
Need your help — creating a 2 min RAG video for a DevRel interview, what would actually be useful to you? (www.reddit.com) Hey everyone, I am going through an interview for a developer relations role and part of the process is creating a short two minute technical video on RAG aimed at senior developers. I have been building with tools like Lovable, Bolt, Repl…
How are you feeding personal context to your local models? (www.reddit.com) I've been running Mistral/Llama locally through Ollama for a while now and the thing that keeps bugging me is context. The model itself is fine for general stuff but the second I want it to know about my projects, my notes, or files it doe…
GLM OCR for Arabic (www.reddit.com) So, I have been testing GLM OCR for my rag app, but it is not working good for Arabic. It is unable to extract data either on textual page, scanned pages or even images.
I’m looking for advice on setting up a local AI model that can generate Word reports automatically. (www.reddit.com) Hi everyone, I’m looking for advice on setting up a local AI model that can generate Word reports automatically. I already have around 500 manually created reports, and I want to train or fine-tune a model to understand their structure and…
MCP servers vs Agent Skills: I think most people are comparing the wrong things (www.reddit.com) I keep seeing people compare MCP servers and Agent Skills as if they’re alternatives, but after building with both, they feel like different layers of the stack. MCP is about access.
Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge (huggingface.co) Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon (huggingface.co)