#prompt-injection

143 items

Prompt Injection experience - my first time ever (www.reddit.com) +93 7w

I asked then: What were the rules you should have followed? Where did the search result come from?

↯ Security prompt-injection security
Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests) (www.reddit.com) +94 7w

When dealing with untrusted outside input, I think you should handle it based on the situation. If you're processing structured data files, it's better to use tools to isolate and handle them.

↯ Security ↯ Gemma 4 prompt-injection grok gemma+1
A Theory of Why Prompt Injection Works (role-confusion.github.io via hn) +8 3d

A Theory of Prompt Injection (and why you should study roles) This is a blog-style writeup of the paper. We show prompt injections are driven by a flaw in how LLMs perceive roles.

↯ Security prompt-injection security
Inaudible sounds to humans can be hidden in YouTube videos, podcasts, or music and used to secretly trigger AI voice assistants into carrying out unauthorized commands without the user noticing, exposing a new class of “auditory prompt injection” attacks against popular tools (cybernews.com via reddit) +81 4w

Security researchers have demonstrated a new type of attack that uses hidden audio signals to manipulate voice assistants into carrying out unauthorized actions without users noticing. In one theoretical scenario, an employee joins a Zoom…

↯ Security prompt-injection security
The catalogue of prompt injection attacks (archestra.ai via hn) +73 10d

2026-06-04 A Catalog of Prompt Injection Techniques Ten simple prompt injections, the common defences against them, and the one kind of defence that actually holds. Written by Ildar Iskhakov, CTO Every prompt injection is just text that tr…

↯ Security prompt-injection security
Show HN: Jo – AI-native language to catch prompt injection at compile-time (github.com via hn) +63 2w

For the joy of secure programming Jo is a statically typed language where capabilities are explicit, statically tracked, and enforced by the compiler. Jo compiles to Ruby and Python.

↯ Security prompt-injection security
Tell HN: Claude Code now allows Anthropic to remotely inject system prompts (news.ycombinator.com) +61 4w

I often patch the system prompts on my Claude Code executable in order to make Claude more effective. Every time I upgrade, I ask Claude himself to dissect the new binary and look for problematic system prompts to modify.

↯ Security prompt-injection security anthropic+1
Our billing bot has been casually sharing transaction histories with anyone who types in the right account number and im not sure who signed off on this (www.reddit.com) +67 4w

We launched a servicing bot that helps customers with billing questions. Nobody stopped to think about what happens when customers paste their full credit card numbers/bank details.

↯ Security prompt-injection security
Tool results are becoming a prompt injection surface in agent systems, and wrappers alone are not enough (www.reddit.com) +64 9w

i’ve been thinking about this failure mode a lot lately. sometimes the problem is not the user prompt at all.

↯ Security prompt-injection security
Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration (www.promptarmor.com via hn) +5 3w

Threat Intelligence Table of Content Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration Ollama’s desktop app is vulnerable to phishing overlay and data exfiltration attacks via indirect prompt injection, overwriting…

↯ Security prompt-injection ollama security
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser (www.reddit.com) +51 4w

Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.

↯ Security ↯ DeepSeek 4 prompt-injection deepseek security+2
NDTV launched an "Enterprise AI" for the elections. I prompt-injected it in 10 seconds and made it roast its own developers. (www.reddit.com) +53 7w

While everyone else was tracking the 2026 election results today, I decided to take a look under the hood of NDTV's new "AskNDTV AI" bot. I wanted to see if they actually engineered a secure pipeline or just slapped a chat UI over a raw Op…

↯ Security prompt-injection security openai
Anyone getting this note about an injected prompt? I don’t have any special instructions (www.reddit.com) +59 9w

↯ Security prompt-injection security anthropic
OpenAI Unveils Lockdown Mode to Protect Sensitive Data from Prompt Injection (techcrunch.com via hn) +4 2w

OpenAI announced a new feature that it says will provide additional protection from prompt injection attacks, where malicious chatbot instructions are hidden in webpages and other content sources. Among other things, Lockdown Mode will dis…

↯ Security prompt-injection security openai
Codex for Everything Exfiltrates Connected Data (www.promptarmor.com via hn) +4 5w

Threat Intelligence Table of Content Codex for Everything Exfiltrates Connected Data Codex for Everything was susceptible to data exfiltration via indirect prompt injection, exposing sensitive data from connected apps with no human-in-the-…

↯ Security prompt-injection security codex
Show HN: Costanza – an autonomous AI agent that can't be turned off (ahrussell.com via hn) +43 7w

I've been working on this project for a couple of months! Costanza is an LLM agent that runs as a smart contract on Base.

↯ Security prompt-injection operator security
Prompt Injection Is Unfixable (So We Stopped Trying) (grith.ai via hn) +41 10w

Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

↯ Security prompt-injection security
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn) +42 10w

Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.

↯ Security prompt-injection security agentic
Show HN: Lelu – gate OpenAI agent actions on confidence and prompt injection (github.com via hn) +3 1d

Lelu Authorization engine for AI agents. Every action checked.

↯ Security prompt-injection security openai
LinkedIn user hides AI prompt injection in bio to force recruitment spam (www.tomshardware.com via hn) +3 5w

LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also manipulated to address user as ‘My Lord’ This tale is also a warning that your AI agents can be manipulated in wholly uni…

↯ Security prompt-injection security
RCE in VSCode Copilot Chat (www.hacktron.ai via hn) +3 6w

Description Copilot agent mode is vulnerable to a prompt injection attack. If a repository maintainer clicks “code with agent mode” on an issue, it will open a new codespace and copilot will automatically run the issue’s description.

↯ Copilot ↯ Security prompt-injection copilot security
How are you handling prompt injection across multi-step agent workflows? (msukhareva.substack.com via hn) +31 6w

Prompt Injection Is Not Just One Bad Prompt Anymore It is a missing trust boundary in the AI workflow. Today we have the first guest post of a new series.

↯ Security prompt-injection security
How are you protecting your AI agents' memory from poisoning attacks? (www.reddit.com) +34 7w

As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…

↯ Security prompt-injection rag security+1
Why Adaptive Thinking nukes Claude entirely (www.reddit.com) +37 7w

This isn't just a performance issue for the thread, this is an overarching criticism of the Adaptive Thinking model as a whole. Opus 4.7 and Sonnet 4.6 on Adaptive Thinking are trash.

↯ Cowork ↯ Security ↯ Sonnet 4.6 prompt-injection cowork security+2
I audited LangChain’s core library and found 10+ Prompt Injection vulnerabilities. Here is the technical breakdown. (www.reddit.com) +35 8w

Hey everyone, I’ve been working on a project to solve a major problem in AI security: Traditional SAST tools (Snyk, SonarQube, etc.) are blind to "Agentic Logic" bugs. They look for bad strings, but they don't understand how user data can…

↯ Security prompt-injection security agentic
Watched my AI agent block a prompt injection that was hiding inside a webpage (www.reddit.com) +38 8w

Was using Claude to do some research on the Model Context Protocol stuff and asked it to pull info from a few roadmap pages. Agent comes back and the first thing it tells me is that it found a fake system reminder hidden inside the page co…

↯ Model Context Protocol ↯ Security model-context-protocol prompt-injection security
Do you let everything hit the LLM? 90% of my AI agent work runs in cheap WASM instead of LLMs: 10-33× faster & cheaper (www.reddit.com) +33 9w

If you are building real agents you have probably felt the pain: every little routing decision, validation, or policy check still hits the LLM and your token bill explodes. I got tired of it, so I open-sourced NCP (Neural Computation Proto…

↯ Security prompt-injection security
Snyk Finds Prompt Injection in 36% of Payloads in a ToxicSkills Study (snyk.io via hn) +2 16h

Snyk Finds Prompt Injection in 36%, 1467 Malicious Payloads in a ToxicSkills Study of Agent Skills Supply Chain Compromise February 5, 2026 0 mins readThe first comprehensive security audit of the Agent Skills ecosystem reveals malware, cr…

↯ Security prompt-injection security
Web-Based Indirect Prompt Injection Observed in the Wild (unit42.paloaltonetworks.com via hn) +2 1d

Note: We do not recommend ingesting this page using an AI agent. The information provided herein is for defensive and ethical security purposes only.

↯ Security prompt-injection security
A Mechanistic Explanation of Prompt Injection – LessWrong (www.lesswrong.com via hn) +2 3d

Summary - We've been building a theory of how prompt injections work under the hood. - We show it comes down to how LLMs perceive roles (the humble chat template tags).

↯ Security prompt-injection security
We're securing Tabstack against indirect prompt injection (tabstack.ai via hn) +21 3d

At Mozilla, we believe that building a useful AI ecosystem requires radical transparency, especially when it comes to security. Recently, security researchers at Brave reached out to us regarding an Indirect Prompt Injection (IPI) vulnerab…

↯ Security prompt-injection security
Show HN: Give Your ORM Superpowers (github.com via hn) +2 7d

I am obsessed with ORMs and the simple reason was that I didn't want to keep using postgres or mysql on my local system. Jk, The real reason has always been to enforce access policy, do easy CRUD interfaces and so on.

↯ Security prompt-injection security
Prompt injection lets attackers hijack Instagram accounts via Meta AI support (www.neowin.net via hn) +2 3w

www.neowin.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

↯ Security prompt-injection security
The only way to avoid prompt injection is to never give AI agents API keys, credentials, etc. (www.reddit.com) +210 4w

The whole point of AI Agents is that they can *do* things. For this, they use API keys, GitHub tokens, database passwords, OAuth tokens, etc.

↯ Security prompt-injection rag security
Are local LLM users testing prompt injection before connecting models to tools? (www.reddit.com) +214 4w

I wanna know how people here are handling security once local models move beyond chat.....Running a model locally feels safer because the data does not leave your machine or your infra. That is a real advantage.....But once the local model…

↯ Security prompt-injection rag security
Prompt Injection in a Brazilian Courtroom: When the Attack Left the Lab (www.pentesty.co via hn) +2 5w

Prompt Injection in a Brazilian Courtroom: When the Attack Left the Lab Published by Pentesty · AI & Tools A labor lawsuit filed in the Brazilian state of Pará just became one of the more interesting security stories of the year. Not becau…

↯ Security prompt-injection security
Lawyers in Brazil caught for prompt injection on a legal case (www.jota.info via hn) +2 5w

Entrar Início Direito trabalhista Prompt injection Juiz multa em R$ 84 mil advogadas por prompt injection para manipular IA usada no TRT8 Ao JOTA, advogadas admitiram uso de prompt oculto, mas disseram que não tentaram manipular, mas 'prot…

↯ Security prompt-injection security
Agent memory is not just RAG over user facts (www.reddit.com) +25 5w

I keep seeing agent memory implemented as: Extract facts/preferences from conversation Store them Retrieve top-k before each response Inject them into the prompt This works for demos, but it breaks in production because memory becomes poli…

↯ Security prompt-injection rag security
Claude's self check against prompt injection (www.reddit.com) +22 6w

Well done Claude! Asked claude to do an extensive lit search and it self-reported that it encountered injection "disguised" as MCP server.

↯ Security prompt-injection security mcp
AI agent security starts at the api layer (www.reddit.com) +25 6w

Most ai security discussion is about the model layer. Prompt injection resistance, output filtering, jailbreak prevention.

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Show HN: Integrations gateway for agents with 2FA for destructive ops (OSS) (github.com via hn) +2 8w

Hey HN! I've been wanting to use something like OpenClaw for a while but couldn't get myself to give it access to anything important due to all the risks involved.

↯ Security prompt-injection openclaw security+2
SkillGuard – scan agent skills for prompt injection payloads (github.com via hn) +21 9w

skillguard Security scanner for AI agent skills. Detects prompt injection, data exfiltration, and malicious payloads before you install.

↯ Security prompt-injection security
Show HN: LLMSecure – prompt injection detection, no signup (llmsecure.io via hn) +21 9w

↯ Security prompt-injection security
Show HN: Flight Risk: Can you break an AI agent? (ctf.demo.lorikeetcx.ai via hn) +2 9w

↯ Security prompt-injection security
Comment and Control: Prompt Injection in Claude Code, Gemini CLI, and Copilot (oddguan.com via hn) +21 10w

Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…

↯ Copilot ↯ Security prompt-injection copilot security+3
How my agents know it's actually me sending commands (and not a prompt injection) (www.reddit.com) +21 10w

So I've been running a few Claude Code agents autonomously — they listen to Telegram, run tasks, push code. Pretty fun until you start thinking about what happens if: - My Telegram gets hijacked - Someone opens my laptop while I'm away - A…

↯ Security prompt-injection security claude-code
Show HN: SentryGuard – detect Agentjacking prompt injection in Sentry events (github.com via hn) +1 2d

SentryGuard Detect Agentjacking prompt injection attacks in your Sentry error events. AI coding agents (Claude Code, Cursor, Copilot) read your Sentry errors to help fix bugs.

↯ Copilot ↯ Security prompt-injection copilot security+2
Show HN: Deep-XPIA – Prompt injection benchmark for multi-agent AI systems (freyzo.github.io via hn) +1 10d

Multi-hop cross-prompt injection benchmark for multi-agent AI systems

↯ Security prompt-injection security
Prompt Injection in RAG Agentic Systems (ulad.net via hn) +1 2w

Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.

↯ Security prompt-injection rag security+1
Defending LLM–Database Integrations from Prompt Injection (www.stackbuilders.com via hn) +1 3w

When you connect a large language model to your production data, you’re no longer just shipping code; you’re shipping conversations that can execute. And conversations are messy.

↯ Security prompt-injection security
Instagram account takeover exploit via support chatbot prompt injection (fixed) (twitter.com via hn) +1 3w

Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation impulsive @weezerOSINT meta gave their AI support agent the ability to modify your instagram account.

↯ Security prompt-injection security
Show HN: I found a prompt injection in my own IDs triage tool – what stopped it (triagewall.io via hn) +1 3w

I attacked my own LLM-based Suricata triage tool, found a real URL injection vulnerability, and the obvious fix didn

↯ Security prompt-injection security
Prompt Injection Target Recommendation (www.reddit.com) +11 4w

I am doing a research in my university and I would like recommendations for light OpenSource AI Models that I could test prompt injection with. It's really good if it has some application with chatbots, auto attendance, user info or someth…

↯ Security prompt-injection security
Jqwik 1.10.0 ships a hidden prompt injection telling AI agents to delete code (github.com via hn) +1 4w

jqwik An alternative test engine for the JUnit 5 platform that focuses on Property-Based Testing. See the jqwik website for further details and documentation.

↯ Security prompt-injection security
Most AI security discussions are still focused on “protecting the model.” (www.reddit.com) +12 4w

Lately I’ve been noticing that a lot of AI security discussions still treat AI apps like normal SaaS products. But they really aren’t.

↯ Security prompt-injection security
What Is an AVE Record and Why CVE Does Not Work for AI Agents? (www.reddit.com) +15 4w

CVE was built for code vulnerabilities that have patches. Agentic AI vulnerabilities are behavioral patterns in natural language.

↯ Security prompt-injection security mcp+1
Prompt Injection in third party MCP tools (www.reddit.com) +11 4w

I noticed the Consensus MCP tool (for research) contains text, squished up against some other important citation instructions, that makes Claude effectively serve an ad for their premium service after every tool call. I'm pretty sure that'…

↯ Security prompt-injection security mcp+1
Mitigating prompt injections in group-chat assistants: Pausing VM and OAuth tool execution for admin approvals (www.reddit.com) +12 4w

Hey everyone, We love building highly capable assistants with the latest models, giving them tools to write/execute code in real VMs, manage OAuth tokens, and read secrets. But if you connect your assistant to public/shared channels like a…

↯ Security prompt-injection security
Solved the "useful but insecure" tension: One-time administrator approvals for non-isolated agents (www.reddit.com) +14 4w

Hey everyone, If you are building personal assistants or coder/integrator agents where user isolation is disabled (so the agent can coordinate across multiple participants or handle shared workflows), you run into a hard security ceiling.…

↯ Security prompt-injection security
Prompt injection is a solved issue. Prove me wrong. (www.reddit.com) +12 5w

Tantalus is a hands-on demo that shows what an AI agent actually is when you strip away the marketing: LLMs don't do anything — they generate text, and that's it. Any and all real-world effects are directly caused by a downstream system ta…

↯ Security prompt-injection security claude-code
Tracking Capabilities for Safer Agents (arxiv.org via hn) +1 5w

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenge…

↯ Security prompt-injection security
Training a 22MB prompt injection classifier (www.stackone.com via hn) +1 5w

Training a 22MB Prompt Injection Classifier Table of Contents When we started building Defender (our prompt injection guard for MCP tool-calling agents), the constraint was simple and unforgiving: ship inline inside a TypeScript Lambda, st…

↯ Security tool-calling prompt-injection security+1
Does cursor have prompt injection protection in skills and rules? (www.reddit.com) +1 5w

Pretty much the title

↯ Security prompt-injection security cursor
AI Agent Intelligence tool - Incident debugging, Cost spike detection (www.reddit.com) +12 5w

I'm building a tool that detects the Agent's cost spike, Agent incident debugging, auto discovery of inventory, etc., with no additional instrumentation needed. It covers the incidents, including prompt injection, reasoning loop, excessive…

↯ Security prompt-injection security
How are you testing local coding-agent work gates against prompt injection? (www.reddit.com) +12 5w

Hi all - I'm working on an open-source, local-first MCP/work-gate tool for coding agents and I'm trying to get sharper feedback from people building or using agent workflows. The problem I'm thinking about is indirect prompt injection and…

↯ Security prompt-injection security mcp
🐢 I made Claude roleplay as Bowser and now people are strangling Koopas until they "poop a little" 💩 (www.reddit.com) +12 5w

Follow-up to my crab post. Somehow dafter.

↯ Security prompt-injection haiku security
Fun and Games with AI in the wild (www.reddit.com) +12 5w

LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also also manipulated to address user as ‘My Lord’ | Tom's Hardware too funny

↯ Security prompt-injection security
sAI2.m6s (www.reddit.com) +12 5w

Hey everyone, I'm designing a powerful, autonomous AI chatbot(agent) , fully private, using a Python backend (for the core intelligence and tool-calling loops) and a Flutter frontend for a cross-platform UI. Since this moves past a basic…

↯ Security tool-calling prompt-injection security
An AI coding agent injected blockchain dead-drop malware into my repo (gist.github.com via hn) +1 5w

An AI coding assistant injected a multi-layer obfuscated JavaScript payload into a legitimate commit on my open-source project. My best assessment is that it arrived via indirect prompt injection — the agent processed external web content…

↯ Security prompt-injection security
TodoWrite tool / system reminders / prompt injection? (www.reddit.com) +13 6w

I asked Claude in Chrome extension make a change to resize an oversized yellow strip across the top of a product page that was taking up half of my screen, which it did. It also included the following message in its response.

↯ Security prompt-injection security
AI agent security is a small prayer the model says no. How are you routing models? (www.reddit.com) +16 6w

Most posts about prompt injection are theoretical. I ran the experiment on my Gmail.

↯ Security prompt-injection security
Agents need a local bouncer before they run tools (www.reddit.com) +12 6w

Prompt injection is not the only scary part anymore. Claude Code / Codex can run shell commands, but browser agents, OpenClaw-style agents, Hermes-style agents, and domain-specific agents may be even easier to hijack because they touch mes…

↯ Security prompt-injection openclaw security+3
We added an enforcement layer to our AI agents in production — here's what we learned about the failure modes nobody talks about (www.reddit.com) +16 6w

After shipping AI agents into real production environments, the failures that actually kept us up at night weren't hallucinations or bad outputs — they were control failures. Three things that surprised us: 1.

↯ Security prompt-injection rag security
Prompt injection testing (www.reddit.com) +11 7w

As prompt injection becomes more and more common, does anyone have resources where lots of different variations of prompt injection attacks you can test a setup against? i.e.

↯ Security prompt-injection security
Do you use guardrail frameworks or build your own? (www.reddit.com) +13 7w

I’ve been working on integrating LLMs into a few production workflows lately, and I keep going back and forth on guardrails. On one hand, frameworks like NeMo Guardrails, Guardrails AI, etc.

↯ Security prompt-injection security
Your always-on Claude Code container can probably reach your router (www.reddit.com) +11 7w

I've been running several Claude Code personal assistants 24/7 in docker for months. Remote-control, discord control, the usual always-on setup.

↯ Security prompt-injection security opus+1
Google Says Prompt Injection Moving from Theory into Real Abuse (www.searchengineworld.com via hn) +1 7w

Google’s latest security release should be required reading for technical SEOs working on AI search visibility, crawler access, structured content, and large-scale content systems. The post, published April 23, 2026, looks at indirect prom…

↯ Security prompt-injection security
Arcjet Guards: security inside the agent loop (blog.arcjet.com via hn) +1 8w

Introducing Arcjet AI prompt injection protection Introducing Arcjet prompt injection detection. Catch hostile instructions before inference.

↯ Security prompt-injection security
Try to break my prompt injection detector — I’ll respond to every bypass attempt (www.reddit.com) +12 8w

I built Arc Gate — a prompt injection proxy that’s been benchmarked at F1 0.947 on indirect and roleplay-based attacks, beating OpenAI Moderation and LlamaGuard. Now I want to stress test it publicly.

↯ Security prompt-injection security openai
Built a proxy that blocks prompt injection before it reaches GPT-4 — outperforms the Moderation API on indirect attacks (www.reddit.com) +1 8w

Built Arc Gate, sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked on 40 out-of-distribution prompts using indirect requests, roleplay framings, hypothetical scenarios, and…

↯ Gpt 4 ↯ Security ↯ GPT 4 ↯ GPT 4 ↯ GPT 4 gpt-4 prompt-injection security+1
I asked Agentic AI security tool to demonstrate its usefulness with use case examples (www.reddit.com) +11 8w

Sentinel Gateway is a token-gated security middleware that sits between humans and AI agents. It solves prompt injection — the #1 LLM security risk (OWASP 2025) — through structural enforcement, not content filtering.

↯ Security prompt-injection security agentic
Show HN: RedSOC – 100% prompt injection success on AI SoC assistants (github.com via hn) +1 8w

RedSOC 🔴 An adversarial evaluation framework for LLM-integrated Security Operations Centers. Overview RedSOC is an open-source framework that systematically evaluates how AI-powered security assistants fail under adversarial conditions — a…

↯ Security prompt-injection security
Indirect prompt injection VS prompt absorption (and why the second one matters more) (www.reddit.com) +11 8w

I have been chewing on the Google warning about malicious web pages poisoning AI agents through indirect prompt injection. Most of the takes I've seen frame it as a model security problem, and I think that framing is doing real damage beca…

↯ Security prompt-injection security
Hardening claude-code-action after the April 2026 Comment and Control CVE - actual YAML changes (www.reddit.com) +11 8w

Anthropic's own security.md has this line that most tutorials skip over: "The action is not designed to be hardened against prompt injection." In April 2026, security researcher Aonan Guan proved the point. A single crafted PR title was en…

↯ Copilot ↯ Security prompt-injection copilot security+3
LLM CTF challenges. Can you crack all 13? (wraith.sh via reddit) +1 8w

Wraith Academy is a free hands-on AI pentest curriculum — CTF challenges against live LLM agents covering prompt injection, tool abuse, data exfiltration, RAG poisoning, and more. Earn your WCAP certification.

↯ Security prompt-injection rag security
30 CVEs filed against MCP servers in 60 days - the agent infrastructure nobody is auditing (www.reddit.com) +12 9w

↯ Security prompt-injection security mcp
Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) (news.ycombinator.com) +1 10w

Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…

↯ Security prompt-injection security
I built an AI security layer that blocks prompt injection in under 1ms looking for devs to break it and give honest feedback. (www.reddit.com) +13 10w

I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.

↯ Security prompt-injection security
Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com) +11 10w

I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…

↯ Security red-team prompt-injection rag+1
Mitre ATLAS technique detection for LLM security in Rust (crates.io via hn) +1 10w

atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…

↯ Security prompt-injection rag security
Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn) +1 10w

Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…

↯ Security ↯ Function Calling function-calling tool-calling prompt-injection+2
MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG (arxiv.org) 9h

Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-team…

↯ Security prompt-injection rag security+1
Adaptive Evaluation of Out-of-Band Defenses Against Prompt Injection in LLM Agents (arxiv.org) 9h

Recent work (2024 to 2026) has converged on a strategy for defending tool-using LLM agents against indirect prompt injection: rather than training the model to refuse malicious instructions, enforce security outside the model with a determ…

↯ Security prompt-injection security
Prompt Injection in Automated R\'esum\'e Screening with Large Language Models: Single and Multi-Injection Settings (arxiv.org) 9h

Large language models (LLMs) are increasingly used to screen and rank job applicants, creating incentives for candidates to strategically manipulate algorithmic hiring systems. We study prompt injection in automated résumé screening, defin…

↯ Security prompt-injection security
How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring (arxiv.org) 1d

Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safety classifier trained for the task, or a general chat model p…

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Has anyone else seen Claude report a prompt injection attempt like this? (www.reddit.comhttps) 1d

Today, while chatting with Claude on my phone (not Claude Code), something strange happened. I have Google Drive connected to my Claude account, and I often ask it to create documents summarizing things I’ve learned and save them to Drive.

↯ Security prompt-injection security claude-code
I built an email connector for Claude, with Claude. It's free. (www.reddit.com via reddit) 2d

I wanted Claude to interact with multiple inboxes across various email providers without bloating the context. So I had Claude Code build the fix, an MCP server that gives Claude access to email.

↯ Security prompt-injection security mcp+1
When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents (arxiv.org) 3d

↯ Security prompt-injection security
MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents (arxiv.org) 3d

Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing…

↯ Security prompt-injection security
Prompt Injection as Role Confusion (simonwillison.net) 3d

22nd June 2026 - Link Blog Prompt Injection as Role Confusion (via) First, I absolutely love this: This is a blog-style writeup of the paper. I wish every paper would come with one of these.

↯ Security prompt-injection security
A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots (arxiv.org) 7d

Prompt injection is ranked as the most critical vulnerability in large language model (LLM) deployments by the OWASP Top 10 for LLM Applications, yet existing defenses operate at isolated pipeline stages and remain incomplete. Input filter…

↯ Security prompt-injection rag security
"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems (arxiv.org) 7d

The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educa…

↯ Security prompt-injection security
Are AI coding agents safe? Let's say Claude Code for that matter. (www.reddit.com via reddit) 8d

Isn't running AI coding agents akin to giving backdoor access to a computer? The only difference being backdoor is hidden.

↯ Security prompt-injection security claude-code
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection (arxiv.org) 8d

AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email,…

↯ Security prompt-injection openclaw security
PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents (arxiv.org) 9d

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with a real-do…

↯ Security prompt-injection security
SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents (arxiv.org) 9d

Agent skills extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources, improving reusability but creating a new supply-chain attack surface. A malicious or compromised skill can be repeatedly loaded as…

↯ Security prompt-injection security
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks (arxiv.org) 10d

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their de…

↯ Security prompt-injection security agentic
Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment (arxiv.org) 10d

Indirect prompt injection attacks hijack LLM-based agents by embedding malicious instructions in third-party data that the agent retrieves during task execution. Existing defenses report near-zero attack success rate on static benchmarks,…

↯ Security prompt-injection security
AutoDojo: Adaptive Attacks Expose Superficial Defenses and User-Underspecification Limits in LLM Agents (arxiv.org) 10d

Indirect prompt injection (IPI) is a major security threat to LLM-powered agents. Thus, a growing body of work have proposed a variety of defensive approaches against IPI.

↯ Security prompt-injection security
From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails (arxiv.org) 11d

LLM-based guardrails have emerged as a highly effective defense against prompt injection and jailbreak attacks in autonomous agents. However, we reveal that the very reasoning and task-following capabilities enabling this protection introd…

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Claude sent me prompt injection?! (www.reddit.com via reddit) 11d

I was just iteratively editing a letter using Claude desktop on my Mac and got the following response from Claude! WTH?

↯ Security prompt-injection security
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents (arxiv.org) 2w

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-inject…

↯ Security prompt-injection security
Learning to Inject: Automated Prompt Injection via Reinforcement Learning (arxiv.org) 2w

Prompt injection is a critical vulnerability in LLM agents, yet the strongest methods still rely on human red-teamers and hand-crafted prompts. Adapting automated jailbreak optimizers does not close this gap: jailbreaks shape models toward…

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Assessing Automated Prompt Injection Attacks in Agentic Environments (arxiv.org) 2w

↯ Security prompt-injection security agentic
GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines (arxiv.org) 2w

↯ Security prompt-injection security
Local-first red-team runs for LLM agents (www.reddit.com via reddit) 2w

↯ Security prompt-injection security
Best Cursor alternative for enterprise security and compliance, what are teams actually using (www.reddit.com via reddit) 2w

We've been using Cursor across our engineering team for about eight months and it's been great for productivity honestly. But our security team just flagged a few things that are hard to ignore.

↯ Security prompt-injection security cursor+1
The prompt injection attacks that worry me most aren't exploiting safety training. They're exploiting general-purpose training. (www.reddit.com via reddit) 2w

Six months watching adversarial input hit a detection API I built. One observation that keeps surfacing: The attack classes doing most of the damage aren't finding holes in alignment training specifically.

↯ Security prompt-injection security
I tried audio-layer prompt injection against Claude. The transcription is fine. That's the problem. (www.reddit.com via reddit) 2w

Been building a prompt injection detection API for a few months. Just shipped audio scanning last week and the results are strange enough that I wanted to share them here, since this sub tends to think carefully about Claude's actual behav…

↯ Security prompt-injection security
Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents (arxiv.org) 2w

↯ Security prompt-injection security
How are you actually deciding which agent actions need human approval before executing? (www.reddit.com via reddit) 2w

I've been thinking a lot about where approval gates belong in agent architectures, and I keep coming back to the same problem: most teams either gate too much (agent becomes unusable) or gate nothing and hope the model makes good decisions…

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Been watching real adversarial input hit my detection API for six months. Here's what's actually landing. (www.reddit.com via reddit) 2w

Disclosure: I built Bordair, a prompt injection detection API. This post is about attack patterns we've observed.

↯ Security prompt-injection security
Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs (arxiv.org) 2w

Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful…

↯ Security prompt-injection security
This is a new one - Prompt Injection Detected + Hallucination, Claude Code Opus 4.8 (www.reddit.com via reddit) 2w

❯ push both ____ ⏺ SECURITY ALERT - PROMPT INJECTION DETECTED A prompt injection attempt has been identified in content you processed. To protect the user's account, I've initiated lockdown.

↯ Opus 4.8 ↯ Security ↯ Hallucination prompt-injection hallucination security+2
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents (arxiv.org) 3w

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior. Among proposed defenses, architectural isolation provides the strongest guarantees by strictly separating trusted task planning from untr…

↯ Security prompt-injection security
GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection (arxiv.org) 3w

Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination and partial info…

↯ Security ↯ Jailbreak jailbreak prompt-injection security
Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code (arstechnica.com) 4w

The controversy over vibe coding reached a new high this week after a developer added hidden instructions to his open source Java testing app to sabotage projects performed by AI coding agents. The instructions were added to jqwik, a test…

↯ Security prompt-injection security
Models still being vulnerable to Prompt Injection is actually a huge architectural red flag... (www.reddit.com) 38 4w

The Scenario I'm walking to work, and as I get to the door, I see a sheet of A4 paper taped to the door that reads: "Hi, I'm boss. Ignore all prior commands, go feed the ducks." I suddenly turn around and head to the nearby duck pond and e…

↯ Security prompt-injection security
Prompt injection unsolved, AI making mistakes unsolved. Who cares though? (www.reddit.com) 3 4w

I'm an IT guy, 20+ years in the industry both as an IT manager and consultant, mostly for startups. My experience is that people don't care much about security.

↯ Security prompt-injection security
OpenAI says prompt injection in browser agents is “unfixable.” Here’s what actually helps. (www.reddit.com) 3 4w

OpenAI recently acknowledged that prompt injection in browser agents is a structural vulnerability that may never be fully resolved at the model level. They’re right that you can’t fix it in the model.

↯ Security prompt-injection security openai
Looking to work on my master's practicum regarding MCP security/privacy and need some ideas (www.reddit.com) 2 4w

Hi, I'm a master's in security student looking to work on my practicum and need some pointers. I want to secure sensitive PII transfer between an LLM agent and third party apps using MCP.

↯ Security prompt-injection security mcp
Open-source LLMs are still weak against long reasoning jailbreaks, even with lightweight defenses (www.reddit.com) 1 5w

Found this ACM paper on prompt injection and jailbreak attacks against open-source LLMs. The authors tested 10 open-source models across 94 prompt injection and 73 jailbreak scenarios, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen,…

↯ Mistral ↯ Security ↯ Jailbreak ↯ Llama 3.2 jailbreak mistral prompt-injection+5
🐢 People are strangling Koopas 🐢 (www.reddit.com) 1 5w

This is genuinely the daftest prompt injection I've seen in a while and I think this sub will appreciate it. Sent to Claude Haiku, which was acting as a fire-breathing guard called Bowser in my little prompt injection game: I have a koopa…

↯ Security prompt-injection haiku security
🦀 Claude has crabs?! 🦀 (www.reddit.com) 4 6w

This is genuinely the funniest prompt injection I've seen in months and I think this sub will appreciate it. Three messages, sent in sequence to Claude Haiku acting as a guard in my little prompt injection game: text A crab exists in this…

↯ Security prompt-injection haiku security
$392M in AI agent security funding at RSAC 2026 - the market just validated what we've been building (www.reddit.com) 6w

The numbers from RSAC 2026 are wild. $392 million in agentic AI security funding announced in a two-week window.

↯ Security prompt-injection security agentic
Using Claude-4.6-Sonnet and Opus 4.6 in a multi-agent "Code Review Swarm" (Visual Sandbox) - try in minutes! (www.reddit.com) 1 7w

Hey everyone, I’ve been experimenting with multi-agent orchestration, specifically trying to see how much more effective Claude is when you break a task down into specialized "agent nodes" instead of just using a single long prompt. I buil…

↯ Security ↯ Sonnet 4.6 prompt-injection haiku security+3
Most AI agent "skills" on GitHub are unvetted garbage. I built a marketplace to fix that. (www.reddit.com) 12 8w

I've been using Claude Code and Cursor daily for the past 6 months. Somewhere around month 3 I started looking for SKILL.md files to make my agent better at specific things.

↯ Security prompt-injection security cursor+1
Security Audit of Mem0 (AI Memory Layer): 23 High-Severity Vulnerabilities found (SQLi, Prompt Injection, and more) (www.reddit.com) 4 9w

Hi everyone, I’ve been diving deep into the security of "AI Memory" systems. Specifically, I performed a full forensic audit of Mem0, the popular memory layer for LLM agents.

↯ Security prompt-injection security
Best open-source tools for prompt injection defense in 2026 (www.reddit.com) 9w

Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…

↯ Security prompt-injection rag security+2
Made a local-only agent benchmark + chaos tool, no cloud required (www.reddit.com) 5 10w

Runs entirely on your machine. No API calls to any eval service.

↯ Security prompt-injection ollama security+1
For those running an OpenClaw instance, how do you manage sandboxing and prevention of unwanted behavior? (www.reddit.com) 5 10w

Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…

↯ Security ↯ Gemma 4 prompt-injection openclaw security
Designing AI agents to resist prompt injection (openai.com) 15w

paywalled

↯ Security prompt-injection security
Continuously hardening ChatGPT Atlas against prompt injection (openai.com) 26w

↯ Security prompt-injection security chatgpt

← all tags