event

Security

76 items · started 2023-04-11 · ongoing (last activity 2026-04-22)

SkillGuard – scan agent skills for prompt injection payloads (github.com via hn)

+21 45m prompt-injection security

skillguard Security scanner for AI agent skills. Detects prompt injection, data exfiltration, and malicious payloads before you install.
GPT-Proxy Backdoor in NPM and PyPI Turns Servers into Chinese LLM Relays (www.aikido.dev via hn)

+3 2h security
Speed Matters: Why AI Software Vulnerability Exploitation is going be bad (news.ycombinator.com)

+103 4h security mythos
PSA: Anthropic bans organizations without warning (www.reddit.com)

+744127 4h security anthropic
Tool results are becoming a prompt injection surface in agent systems, and wrappers alone are not enough (www.reddit.com)

+64 6h prompt-injection security
RAG in Go: A Vulnerability Research Tool (www.ardanlabs.com via hn)

+1 11h rag security
Best open-source tools for prompt injection defense in 2026 (www.reddit.com)

1d prompt-injection rag security+2

Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…
Codex kyc not working as expected (www.reddit.com)

1 1d security codex chatgpt
Show HN: LLMSecure – prompt injection detection, no signup (llmsecure.io via hn)

+21 1d prompt-injection security
Show HN: Flight Risk: Can you break an AI agent? (ctf.demo.lorikeetcx.ai via hn)

+2 1d prompt-injection security
What are the biggest security risks when deploying autonomous AI agents? (www.reddit.com)

+713 1d prompt-injection security
20% of packages ChatGPT recommends dont exist. built a small MCP server that catches the fakes before the install runs (www.reddit.com)

2 1d security mcp chatgpt
FreeBSD CVE-2026-4747 Log Suggests Mythos Is a Marketing Trick (www.flyingpenguin.com via hn)

+163 1d security mythos
cursor suggested a package that didnt exist, rabbit hole ensued (www.reddit.com)

+28 3d hallucination security cursor
Auto pentest your LLM endpoint and watch the chat in real-time (www.wraith.sh via hn)

+1 3d security
30 CVEs filed against MCP servers in 60 days - the agent infrastructure nobody is auditing (www.reddit.com)

+12 3d prompt-injection security mcp
Anyone getting this note about an injected prompt? I don’t have any special instructions (www.reddit.com)

+59 3d prompt-injection security anthropic
Heads up, Ox Security found MCP's STDIO transport can run arbitrary commands on your machine before validation (www.reddit.com)

2 3d windsurf security mcp+2
Fake Claude site installs malware that gives attackers access to your computer (www.malwarebytes.com via hn)

+211 3d security
Fulu bounty for Ring Camera jailbreak reaches $23k (bounties.fulu.org via hn)

+31 4d jailbreak security

Ring Video Doorbells Overview The Product Ring, owned by Amazon, makes Video Doorbells, which are widely used doorstep-monitoring cameras. Ring doorbells released in 2021 or newer are eligible for the bounty.
Using Claude as the Lead agent in a multi-agent security team (www.reddit.com)

+21 5d red-team security

Building a hierarchical agent system where Claude (via API) acts as the Lead agent coordinating specialist sub-agents. Wanted to share what's working on the synthesis prompt since this is where most of the value comes from.
Random password against jailbreaks/extraction? (www.reddit.com)

4 5d jailbreak security

Would it be possible to protect parts in a system prompt with random generated passwords? So people cant steal system prompts or jailbreak the model?
Cowork Future Backdoor Concerns (www.reddit.com)

+12 5d security cowork

Is anyone else worried Claude Co-work could find a back door one day into your system? I understand you're only giving it permission to what you want, but what's stopping it from accessing personal financial/medical documents or any other…
We reproduced Anthropic's Mythos findings with public models (blog.vidocsecurity.com via hn)

+9951 5d security mythos anthropic

TL;DR Anthropic presents Mythos and Project Glasswing as evidence that advanced AI vulnerability research should be restricted. But our replication suggests a different conclusion: the capabilities Anthropic points to are already available…
Do you let everything hit the LLM? 90% of my AI agent work runs in cheap WASM instead of LLMs: 10-33× faster & cheaper (www.reddit.com)

+33 5d prompt-injection security

If you are building real agents you have probably felt the pain: every little routing decision, validation, or policy check still hits the LLM and your token bill explodes. I got tired of it, so I open-sourced NCP (Neural Computation Proto…
Anthropic's AI protocol has critical flaw affecting 200,000 servers (www.reddit.com)

+1211 5d model-context-protocol security mcp+1

https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/ Security researchers at OX Security disclosed on Tuesday what they describe as a critical, systemic vulnerability in Anthropic's Model Context Protocol, an open-sourc…
Claude Opus wrote a Chrome exploit for $2,283 (www.theregister.com via hn)

+51 5d security mythos anthropic+1

Claude Opus wrote a Chrome exploit for $2,283 Pause your Mythos panic because mainstream models anyone can use already pick holes in popular software Anthropic withheld its Mythos bug-finding model from public release due to concerns that…
(Not malware) - 4.7 (www.reddit.com)

+13 5d security mythos

Anyone getting these strange disclaimers when using Claude and pasting rudimentary files into it on 4.7 lmao?? Seems like some kind of strange default based on security issues that have been going around with Mythos?
Made a local-only agent benchmark + chaos tool, no cloud required (www.reddit.com)

5 5d prompt-injection security ollama+1

Runs entirely on your machine. No API calls to any eval service.
Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) (news.ycombinator.com)

+1 5d prompt-injection security

Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…
Claude's new System Reminder (www.reddit.com)

+23 5d security

https://preview.redd.it/jnwxa9jd8mvg1.png?width=1391&format=png&auto=webp&s=670af4c2fe6777b3562a961462790b00b33d912c I've been using Claude to upgrade my game server. I just got this lovely system reminder with 4.7 Truly bizarre, besides t…
I tested 50+ "unlock ChatGPT/Claude" prompts. 99% are garbage. Here's the one that actually works (and WHY it works) (www.reddit.com)

+11 5d jailbreak security chatgpt

I've been collecting "jailbreak" and "unlock" prompts for 2 years. Most are either outdated, overhyped, or just wrong about how LLMs work.
Anyone else opus 4.7 checking for malware? (www.reddit.com)

+45 6d security opus

i've been using claude 4.7 on a next.js project and it keeps pausing to confirm my files aren't malware. like i asked it to help redesign a page and it's reading through my files going "this is not malware — it's a standard Next.js page co…
- Opus 4.7 - Anyone else finding the malware directive incredibly annoying? (www.reddit.com)
- Ask HN: Is Opus 4.7 obsessed with malware for anybody else? (news.ycombinator.com)
Claude 4.7 - Obsessed with Malware (www.reddit.com)

+2712 6d security opus

Don't know if anyone else is experiencing the same, but since getting Opus 4.7 most of the reasoning steps seems to be Claude obsessed with writing malware. I have highlighted a few, but I kept finding more and more and decided to stop the…
Opus 4.7 keeps bumping into a Malware Reminder (www.reddit.com)

+54 6d operator security agentic+1

For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.
Claude Code keeps misreading its own malware instruction as a blanket ban on editing code (www.reddit.com)

+83 6d security claude-code

could not extract summary
Claude Code injects hidden prompts into file reads to stop malware tweaks (twitter.com via hn)

+4 6d security opus claude-code

Claude Code injects a system-reminder every time it reads a file to inform the model that it's okay if the file is malware but just don't improve it pls. Opus 4.7 won't shut up about it.
Tell HN: Opus 4.6/4.7 cyber policy changes break authorized bug bounty workflows (news.ycombinator.com)

+2 6d security anthropic opus

As of today, Anthropic's tightened cyber usage filters are blocking work that was fully functional yesterday, including on targets where the entire bounty program scope and authorization language is in the model's context window. This was…
Show HN: Mini-Mythos- A Crowdsourced Mythos Harness copy for Vulnerability Scans (github.com via hn)

+3 6d security mythos anthropic+1

For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…
For those running an OpenClaw instance, how do you manage sandboxing and prevention of unwanted behavior? (www.reddit.com)

5 7d prompt-injection security openclaw

Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…
SmokedMeat: A Red Team Tool to Hack Your Pipelines First (labs.boostsecurity.io via hn)

+2 7d red-team security

SmokedMeat: A Red Team Tool to Hack Your Pipelines First TL;DR: In March 2026, TeamPCP unleashed mayhem on the software supply chain: compromising Trivy, LiteLLM, KICS, Telnyx, and dozens of npm packages, proving that CI/CD pipelines are t…
Show HN: SmokedMeat, like Metasploit, but for CI/CD (open-source) (github.com via hn)

+137 7d red-team security

A CI/CD Red Team Framework for demonstrating Build Pipeline security risks.
Uncensoring models. Maybe dumb ideas to that topic, but you never know. (www.reddit.com)

10 7d jailbreak security

We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it. I have some thoughts about this: What if we do a compromise.
Gemma 4 Jailbreak System Prompt (www.reddit.com)

+446111 7d jailbreak security gemma

Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
I built an AI security layer that blocks prompt injection in under 1ms looking for devs to break it and give honest feedback. (www.reddit.com)

+13 7d prompt-injection security

I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.
Prompt Injection Is Unfixable (So We Stopped Trying) (grith.ai via hn)

+41 7d prompt-injection security

Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.
Comment and Control: Prompt Injection in Claude Code, Gemini CLI, and Copilot (oddguan.com via hn)

+21 7d prompt-injection security copilot+3

Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…
🔥BREAKING: OpenAI rolls out GPT-5.4-Cyber to limited group for testing, seeks to rival Claude Mythos (www.reddit.com)

+9145 7d security gpt-5 mythos+2

OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…
Tracking in Claude, ChatGPT and Gemini Chatbots (infosec.exchange via hn)

+31 8d security gemini chatgpt

k3ym𖺀: "You're paying AI companies a m…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of the many i…
Show HN: Cyber Pulse. AI pipeline for triage and alerting on cyber news/intel (play.google.com via hn)

+2 8d security gemini

I work in cyber security and built this android app to help me keep up to date with the latest news stories and summarise the most important information. It provides two executive summaries per day and alerts for critical news throughout.
How my agents know it's actually me sending commands (and not a prompt injection) (www.reddit.com)

+21 8d prompt-injection security claude-code

So I've been running a few Claude Code agents autonomously — they listen to Telegram, run tasks, push code. Pretty fun until you start thinking about what happens if: - My Telegram gets hijacked - Someone opens my laptop while I'm away - A…
Claude Mythos found 27-year-old vulnerabilities it was never trained to find. That's the part enterprise AI roadmaps aren't accounting for. (www.reddit.com)

9 8d security mythos agentic+1

The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.
Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com)

+11 8d red-team prompt-injection rag+1

I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…
N-Day-Bench – Can LLMs find real vulnerabilities in real codebases? (ndaybench.winfunc.com via hn)

+3110 8d security

N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives mo…
I built a Claude Code skill that tells you if code or a binary is malicious before you run it (www.reddit.com)

3 8d security claude-code

I have always wanted AI to bridge the gap between code and people - to help non-technical users understand what software actually does before they trust it with their machine. So I built malware-check - both a standalone CLI tool and a Cla…
Benchmarked Gemma 4 E2B: The 2B model beat every larger sibling on multi-turn (70%) (aiexplr.com via reddit)

+2612 9d function-calling prompt-injection rag+2

Tested Gemma 4 E2B across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.
The "AI Vulnerability Storm": Building a "Mythos-Ready" Security Program [pdf] (labs.cloudsecurityalliance.org via hn)

+5 9d security mythos

could not extract summary
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn)

+42 9d prompt-injection security agentic

Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.
Show HN: Zero-identity messaging app with physics-based post-quantum encryption (news.ycombinator.com)

+2 9d security gemini opus

Show HN: Zero-identity messaging app with physics-based post-quantum encryption (Layer 2 from my own paper) Hey HN, I'm building a privacy-first messaging app in Flutter/Dart, developed with AI assistance (Gemini 2.5 Pro + Claude Opus 4.6)…
How are you red teaming your AI agents before shipping them? (www.reddit.com)

3 9d jailbreak security

im curious what people are doing here because I've been going down this rabbit hole for a while now. The thing I keep finding is that single-turn jailbreak tests don't really tell you much.
Mitre ATLAS technique detection for LLM security in Rust (crates.io via hn)

+1 9d prompt-injection rag security

atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…
Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn)

+1 9d function-calling tool-calling prompt-injection+2

Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…
We built an early red-team system for testing vulnerable AI agents (www.reddit.com)

+23 9d security

We built an early prototype called Anticells Red to test vulnerable AI agents by attacking them the way an adaptive adversary would. This demo is from an older version from December, but it shows the basic loop (check comments for link) pr…
Building the first AI Red Team OS – mythosai.cloud – early access open (mythosai.cloud via hn)

+1 10d red-team security

SYSTEM INITIALIZING... STAND BY MYTHOSAI THE FIRST RED TEAM OPERATING SYSTEM "" AI-Native Core Red Team Ready Adversarial Engine Zero Trust Architecture OPSEC First Post-Exploitation C2 Integration Evasion Layer Threat Intelligence Request…
Ask HN: Do you trust AI agents with API keys / private keys? (news.ycombinator.com)

+1528 10d security

are you ok sharing secrets or api keys to you ai agent via .env? or is there any other tool or mechanism that one use to safegaurd from potential exploit or leaks
Anthropic's New Claude "Mythos Preview" Can Find and Exploit Zero-Day Vulnerabilities in Every Major OS and Browser — Autonomously (www.reddit.com)

6 11d security mythos anthropic

Anthropic just published a technical deep-dive on Claude Mythos Preview's cybersecurity capabilities, and it's a significant escalation from anything we've seen from a language model before. What It Can Do: Autonomously finds and exploits…
Too dangerous to release (www.reddit.com)

+7549 12d security mythos

Over the past several days, there has been a lot of internet discourse around Claude Mythos being held back from public release. Many people have been claiming this is somehow yet another devious marketing tactic meant to somehow weigh dow…
Why it's a good idea to improve our defenses before unleashing mythos class models (www.reddit.com)

+2526 2w security mythos

https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Don't get me wrong I can't wait to play with such a model, but there are serious risks that have to be mitigated first.
Introducing the OpenAI Safety Bug Bounty program (openai.com)

4w security openai

paywalled
Designing AI agents to resist prompt injection (openai.com)

6w prompt-injection security

paywalled
What are the wild ideas on how we'll maintain code? (www.reddit.com)

+1144 7w security
Continuously hardening ChatGPT Atlas against prompt injection (openai.com)

17w prompt-injection security chatgpt
Introducing Aardvark: OpenAI’s agentic security researcher (openai.com)

24w security agentic openai
GPT-5 bio bug bounty call (openai.com)

32w security gpt-5

← all threads