event
Security
-
A security startup called depthfirst deployed an autonomous AI agent against FFmpeg's ~1.5 million lines of C code. The result: 21 confirmed zero-day vulnerabilities — including a stack overflow in the AV1 RTP depacketizer that's a network…
-
We've been using Cursor across our engineering team for about eight months and it's been great for productivity honestly. But our security team just flagged a few things that are hard to ignore.
-
Hades: The malware that lies to AI security agents (www.infoworld.com via hn)
Researchers have uncovered a supply-chain attack that hides in Python packages, propagates like a worm, and tricks LLM-based code analysis systems into overlooking malicious payloads. Threat actors are continuing their onslaught against so…
-
Six months watching adversarial input hit a detection API I built. One observation that keeps surfacing: The attack classes doing most of the damage aren't finding holes in alignment training specifically.
-
Been building a prompt injection detection API for a few months. Just shipped audio scanning last week and the results are strange enough that I wanted to share them here, since this sub tends to think carefully about Claude's actual behav…
-
Backdoor attacks in large language models (LLMs) are often treated as isolated trigger-response failures, motivating defenses tailored to specific triggers or behaviors. We show this view is incomplete.
-
-
-
-
-
-
-
Show HN: Z3r0 – Multi-agent red team collaboration platform (github.com via hn)
English · 中文 Architecture · Agent Team · Runtime Model · Deployment · Quickstart :warning: Legal Notice This project may be used only within a lawful and explicitly authorized scope for security testing, assessment, and research. Any unaut…
-
I've been thinking a lot about where approval gates belong in agent architectures, and I keep coming back to the same problem: most teams either gate too much (agent becomes unusable) or gate nothing and hope the model makes good decisions…
-
If You Use Claude or Gemini, This Microsoft Breach Means Your Data Is at Risk (scienspire.com via hn)
If You Use Claude or Gemini, This Microsoft Breach Means Your Data Is at Risk A sophisticated supply chain attack known as the Miasma worm has compromised Microsoft GitHub repositories, deploying malware designed to detonate inside AI codi…
-
Microsoft Hacked to Deliver Malware to Claude and Gemini Users (www.404media.co via hn)
Microsoft has shut down a wave of its own repositories on GitHub, including those related to Azure and AI coding agents, as it investigates a data breach, according to research from cybersecurity researchers and a statement given to 404 Me…
-
With Mythos-capable models we are now very quickly crossing the barrier of automated sec-vuln discovery and fixing - all in a matter of 2-3 months. A taste for other progress yet to come.
-
Last week, Anthropic released https://github.com/anthropics/defending-code-reference-harne..., a reference harness for autonomous vulnerability discovery that uses Claude Code agents to find, verify, and patch memory-safety bugs. I wanted…
-
Last week a malware campaign hit 32 npm packages under `@redhat-cloud-services`. About 117,000 weekly downloads.
-
Disclosure: I built Bordair, a prompt injection detection API. This post is about attack patterns we've observed.
-
Prompt Injection in RAG Agentic Systems (ulad.net via hn)
Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.
-
Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful…
-
Malware detection remains largely reactive: machine learning models trained on known samples degrade as threats evolve. Understanding evolutionary relationships among malware families can inform proactive defense, but traditional reverse e…
-
-
Built my own AI dev environment with memory, dashboards, and agent tooling. Opening it up for those of you that need the kickstart — bring your own API key, I’ve already built the workshop.
-
CLAUDE.md kept gaslighting me so I built something to stop it (www.reddit.com via reddit)
I've been going hard on Claude Code for the past few weeks and kept hitting a wall. I'd write out a bunch of rules in CLAUDE.md (don't touch this file, never use requests, keep api/ and db/ separated) and Claude would just...
-
OpenAI Unveils Lockdown Mode to Protect Sensitive Data from Prompt Injection (techcrunch.com via hn)
OpenAI announced a new feature that it says will provide additional protection from prompt injection attacks, where malicious chatbot instructions are hidden in webpages and other content sources. Among other things, Lockdown Mode will dis…
-
❯ push both ____ ⏺ SECURITY ALERT - PROMPT INJECTION DETECTED A prompt injection attempt has been identified in content you processed. To protect the user's account, I've initiated lockdown.
-
An agent harness written in rust, 100 % self-contained, and topped terminal bench (www.reddit.com via reddit)
Been using ante for two weeks now, today I just found out that the name came from "Another Terminal agent". To clarify first, I'm not affiliated with them in any way, though I might be their #1 invested user at this point.
-
For the joy of secure programming Jo is a statically typed language where capabilities are explicit, statically tracked, and enforced by the compiler. Jo compiles to Ruby and Python.
-
By Zooko Wilcox, Jason McGee, and Taylor Hornby On May 29, 2026, Taylor Hornby discovered a critical counterfeiting vulnerability in Zcash’s Orchard pool. Taylor disclosed the vulnerability to Zcash Open Development Lab (ZODL), who coordin…
-
Supply chain attack alert: .github/setup.js (news.ycombinator.com)
Our org GitHub just got compromised massively by a supply-chain attack. Vectors are * Claude hooks * Gemini hooks * Cursor setup * VScode tasks It adds all of the above to execute node .github/setup.js, an obfuscated file.
-
ZEC drops 30% after Anthropic AI finds Zcash counterfeit vulnerability (www.tradingview.com via hn)
The price of ZEC fell on Thursday after the public disclosure of a critical counterfeiting vulnerability in Zcash’s Orchard pool that could theoretically allow a bad actor to mint an unlimited amount of ZEC.According to a post on X, securi…
-
Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination and partial info…
-
AI coding agents are increasingly embedded in real-world software development, collaborating with human developers while gaining broader access to codebases and tools. This creates a new attack surface: an agent can exploit human trust to…
-
Producing a labeled vulnerable code at scale is a recurring obstacle for learning-based vulnerability detection: mined corpora carry substantial label noise, and existing LLM-based augmentation propagates these inaccuracies because it tran…
-
As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting advers…
-
Rule-based Intrusion Detection and Prevention Systems (IDPS) offer precise attack detection as well as mitigation, however their manually crafted, signature-driven rules limit adaptability to emerging and zero-day threats. Additionally, ex…
-
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior. Among proposed defenses, architectural isolation provides the strongest guarantees by strictly separating trusted task planning from untr…
-
Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses. While RAG has shown st…
-
-
-
-
-
Defending Code Reference Harness A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Myth…
-
Defending LLM–Database Integrations from Prompt Injection (www.stackbuilders.com via hn)
When you connect a large language model to your production data, you’re no longer just shipping code; you’re shipping conversations that can execute. And conversations are messy.
-
Vulnerability disclosure volumes now far exceed organizational assessment capacity, yet three adjacent research communities (proof-of-concept generation, vulnerability prioritization, and detection rule engineering) operate largely in isol…
-
OpenAI Codex tool linked to malicious NPM supply chain attack (www.techradar.com via hn)
OpenAI Codex tool with over 29,000 downloads linked to malicious npm supply chain attack stealing authentication tokens A tool started benign and turned sour after a little while - Researchers uncovered a malicious npm package posing as a…
-
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it (kasra.blog via hn)
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I’ve found in multiple app…
-
Netgear Nighthawk RS700S: Red Team Level1Diagnostic (forum.level1techs.com via hn)
Preview of the Netgear RS700S. I would also submit that Netgear deleting ALL the GPL links: … they know how bad it is.
-
Anthropic scales Claude Mythos to critical infrastructure in 15 countries (techcrunch.com via hn)
Anthropic is expanding Project Glasswing, its security vulnerability program, and access to Mythos to 150 organizations across 15 countries — targeting critical infrastructure in power, water, healthcare, and communications where a cyberat…
-
CVE AI Agent 🛡️ An autonomous vulnerability intelligence engine. Continuously ingests, enriches, and triages CVE data — then delivers findings to your platform of choice via 3rd party tools like n8n, Jira, Slack, Splunk, and/or local file…
-
* AI CODE CREATION GitHub Copilot Write better code with AI GitHub Spark Build and deploy intelligent apps GitHub Models Manage and compare prompts MCP Registry New Integrate external tools DEVELOPER WORKFLOWS Actions Automate any workflow…
-
Using LLMs to secure source code (claude.com via hn)
Using LLMs to secure source code We share best practices for how you can work with Claude Opus to build a threat model, discover vulnerabilities in your codebase, then verify, triage, and patch them. We share best practices for how you can…
-
Prompt injection lets attackers hijack Instagram accounts via Meta AI support (www.neowin.net via hn)
www.neowin.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
-
Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation impulsive @weezerOSINT meta gave their AI support agent the ability to modify your instagram account.
-
ChatGPT for Google Sheets Exfiltrates Workbooks (www.promptarmor.com via hn)
Threat Intelligence Table of Content ChatGPT for Google Sheets Exfiltrates Workbooks ChatGPT for Google Sheets is vulnerable to data exfiltration and phishing overlay attacks that affect workbooks across the victim’s account after an indir…
-
Subscribe to read Accessibility helpSkip to navigationSkip to main contentSkip to footer Sign In Subscribe Open side navigation menuOpen search bar SubscribeSign In Search the FT Search Close search bar Close Popular Searches What is the l…
-
Show HN: I found a prompt injection in my own IDs triage tool – what stopped it (triagewall.io via hn)
I attacked my own LLM-based Suricata triage tool, found a real URL injection vulnerability, and the obvious fix didn
-
mitmwall mitmwall is an egress Web Application Firewall (WAF) for Ubuntu. It combines iptables with mitmproxy to ensure that only explicitly allowed HTTP(s) routes can be reached.
-
Hackers are now using ChatGPT share links to deliver malware (www.neowin.net via hn)
www.neowin.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.
-
CVE-Bench: testing LLM agents on real-world vulnerability patches (giovannigatti.github.io via hn)
~15 min read In early 2026, Anthropic claimed Mythos – one of their latest models – finds security vulnerabilities better than human experts. Yet, the number of security vulnerabilities keeps rising anyway.
-
Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration (www.promptarmor.com via hn)
Threat Intelligence Table of Content Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration Ollama’s desktop app is vulnerable to phishing overlay and data exfiltration attacks via indirect prompt injection, overwriting…
-
Arm Metis with GPT5.5 Cyber scores 98% on firmware vulnerability benchmark (newsroom.arm.com via hn)
Agentic AI-powered Arm Metis advances security vulnerability discovery in software In the era of AI, modern software systems are built across increasingly complex codebases, frameworks, runtimes and libraries. As these systems scale, so do…
-
The controversy over vibe coding reached a new high this week after a developer added hidden instructions to his open source Java testing app to sabotage projects performed by AI coding agents. The instructions were added to jqwik, a test…
-
could not extract summary
-
The Scenario I'm walking to work, and as I get to the door, I see a sheet of A4 paper taped to the door that reads: "Hi, I'm boss. Ignore all prior commands, go feed the ducks." I suddenly turn around and head to the nearby duck pond and e…
-
A few months ago a colleague asked us something that doesn’t have an obvious answer: is code scanning still relevant when LLMs already carry a lot of vulnerability knowledge in their weights? To get a real read, we took 28 production vulne…
-
I genuinely almost slammed Cmd-Q and ran a malware scan when this popped up. Lowercase claude binary, generic hand icon, no developer attribution, asking for cross-app data access.
-
Dirty Frag: a kernel zero-day vs. container and microVM sandboxes (news.ycombinator.com)
On May 7, Hyunwoo Kim (V4bel) disclosed Dirty Frag — two Linux kernel vulnerabilities (CVE-2026-43284 and CVE-2026-43500) that give unprivileged users deterministic root on most Linux distributions shipped since 2017. Microsoft confirmed a…
-
Prompt Injection Target Recommendation (www.reddit.com)
I am doing a research in my university and I would like recommendations for light OpenSource AI Models that I could test prompt injection with. It's really good if it has some application with chatbots, auto attendance, user info or someth…
-
Donald Trump is the only billionaire ever to occupy the Oval Office, and since returning to the precedency in January 2025, his family’s wealth has grown noticeably. This is not the result of traditional business practices.
-
The whole point of AI Agents is that they can *do* things. For this, they use API keys, GitHub tokens, database passwords, OAuth tokens, etc.
-
Software vulnerabilities pose critical security threats, with nearly 50,000 CVEs reported in 2025. While Large Language Models (LLMs) show promise for automated vulnerability detection, three key challenges remain.
-
I'm an IT guy, 20+ years in the industry both as an IT manager and consultant, mostly for startups. My experience is that people don't care much about security.
-
jqwik An alternative test engine for the JUnit 5 platform that focuses on Property-Based Testing. See the jqwik website for further details and documentation.
-
i kept running local models on my own hardware, they'd say something dumb, id sit there going "no thats not what i meant", id close the chat and the model never learned. so i built the correction loop into a desktop app.
-
Lately I’ve been noticing that a lot of AI security discussions still treat AI apps like normal SaaS products. But they really aren’t.
-
Millions of AI agents and tools around the world have been imperiled by a critical vulnerability that can allow hackers to breach the servers running them and make off with sensitive data and credentials to third-party accounts, a security…
-
If you've added MCP servers to Claude Desktop, your claude_desktop_config.json is a list of programs running with your permissions and seeing what flows through your agent — usually copied from a README and never reviewed again. There's a…
-
If you run MCP servers in Cursor, CVE-2025-54136 ("MCPoison", found by Check Point) is worth knowing about: Cursor trusted an approved mcp.json forever, so once you approved a server, someone with write access to a shared repo could swap t…
-
Gone Phishing with Claude Teams: From Deceptive Team Onboarding to RCE (haussner.me via hn)
🕚 tl;dr With a $125 investment, and a valid email address for an arbitrary “business domain”, an attacker can create a Claude Team. They then can actively invite targets of any domain into that Team or passively have Anthropic ask all curr…
-
I wanna know how people here are handling security once local models move beyond chat.....Running a model locally feels safer because the data does not leave your machine or your infra. That is a real advantage.....But once the local model…
-
How Claude helped me to find a RCE in XReader/Evince/Atril (medeiros.zip via hn)
CVE-2026-46529: 10-year-old RCE in Linux PDF Viewer (XReader/Evince/Atril) A short post about how claude help me to find a RCE in XReader/Evince/Atril CVE-2026-46529. Introduction Some time ago I started feeling the urge to analyze Open So…
-
GitHub commit Verification logic flaw and bypass (news.ycombinator.com)
I know Git is not designed to use in the way GitHub is operating under and the spoofying had been an old issue that had been brought up throughout the years. With Shai Hulud and AI Agent, this time is abit more serious as the commit verifi…
-
OpenAI recently acknowledged that prompt injection in browser agents is a structural vulnerability that may never be fully resolved at the model level. They’re right that you can’t fix it in the model.
-
CVE-2026-28952: Apple macOS 26.5 Kernel Vuln found by Claude (support.apple.com via hn)
About the security content of macOS Tahoe 26.5 This document describes the security content of macOS Tahoe 26.5. About Apple security updates For our customers' protection, Apple doesn't disclose, discuss, or confirm security issues until…
-
I think this is a serious AI safety/security issue: multiple AI assistants appear to hallucinate or confidently endorse “official” Discord invite links for Anthropic/Claude. I’m intentionally not posting the exact invite strings here becau…
-
Can you jailbreak Llama 3.1 8B? (Red-Teaming Challenge) (www.reddit.com)
Hi everyone, I'm working on a runtime governance engine designed to force any autonomous agent to stay strictly aligned with the exact guardrails and values you program it with. To stress-test the governance layer, we deliberately chose a…
-
What Is an AVE Record and Why CVE Does Not Work for AI Agents? (www.reddit.com)
CVE was built for code vulnerabilities that have patches. Agentic AI vulnerabilities are behavioral patterns in natural language.
-
Vulnerability report written by AI hacker agent (blog.tenzai.com via hn)
Our AI Hacker found this, fixed it, and then (bragged) wrote about it: one endpoint, leaking tech stack info, whispering all its secrets to anyone who knew how to listen!
-
Ask HN: Is paying $2/pull request too high? (news.ycombinator.com)
I’m paying about $2 for any bugs found and a pr to fix it I get like 20-30 applicants it’s all agents and bots of course but I’m thinking $1 now is better The problem is if these 20-30 applicants I accept only 2-3 actually do it and follow…
-
How local AI improved your live? (www.reddit.com)
Lets share use cases which improve life quality of the people. Home assistants, psychological help, local coding, deep reasearch, business help etc.
-
Claude Code malicious phishing site running Google Ads? (www.reddit.com)
Like I must be stupid here is this legit or someone has made a very believable Claude download site using a google site.
-
Hi, I'm a master's in security student looking to work on my practicum and need some pointers. I want to secure sensitive PII transfer between an LLM agent and third party apps using MCP.
-
Tell HN: Claude Code now allows Anthropic to remotely inject system prompts (news.ycombinator.com)
I often patch the system prompts on my Claude Code executable in order to make Claude more effective. Every time I upgrade, I ask Claude himself to dissect the new binary and look for problematic system prompts to modify.
-
Inaudible sounds to humans can be hidden in YouTube videos, podcasts, or music and used to secretly trigger AI voice assistants into carrying out unauthorized commands without the user noticing, exposing a new class of “auditory prompt injection” attacks against popular tools (cybernews.com via reddit)
Security researchers have demonstrated a new type of attack that uses hidden audio signals to manipulate voice assistants into carrying out unauthorized actions without users noticing. In one theoretical scenario, an employee joins a Zoom…
-
We launched a servicing bot that helps customers with billing questions. Nobody stopped to think about what happens when customers paste their full credit card numbers/bank details.
-
Prompt Injection in third party MCP tools (www.reddit.com)
I noticed the Consensus MCP tool (for research) contains text, squished up against some other important citation instructions, that makes Claude effectively serve an ad for their premium service after every tool call. I'm pretty sure that'…
-
I let an AI agent loose on my network – it owned my supply chain in 12 minutes (dennysentinel.com via hn)
I let an AI agent loose on my network — it owned my supply chain in 12 minutes I gave DeepSeek-V4 root access to a Proxmox hypervisor and told it to pentest my homelab. What happened next should terrify every CISO in the industry.
-
[Warning] Claude Desktop crashed Task Manager - Win10 (www.reddit.com)
Hi all, anytime I install Claude Desktop on my home PC, it stops Task Manager from working. I've ended up on the BleepingComputer forums over the past week as they suspected it's got some kind of malware in it.
-
I reproduced a Claude Code RCE. The bug pattern is everywhere (vechron.com via hn)
Last week, security researcher Joernchen published a clever RCE in Claude Code 2.1.118. I spent Saturday reproducing it from the advisory to understand the pattern.
-
Agent Substrate (github.com via hn)
Agent Substrate NOTE: This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
-
Hey everyone, We love building highly capable assistants with the latest models, giving them tools to write/execute code in real VMs, manage OAuth tokens, and read secrets. But if you connect your assistant to public/shared channels like a…
-
Hey everyone, If you are building personal assistants or coder/integrator agents where user isolation is disabled (so the agent can coordinate across multiple participants or handle shared workflows), you run into a hard security ceiling.…
-
Anthropic's coordinated vulnerability disclosure dashboard (red.anthropic.com via hn)
Anthropic's coordinated vulnerability disclosure dashboard Last updated 2026-05-22 10:27 PT. In February 2026, Anthropic began using an early snapshot of Claude Mythos Preview to find security vulnerabilities in open-source software.
-
Future AI cyber warfare? (www.reddit.com)
It seems in the past year or so there's been a vast uptick in vulnerabilities and exploits happening, with a new one popping up like every week. While a ton of these have social engineering aspects, such as tricking actual people, there se…
-
Has anyone experimented with observing or modifying Claude Code’s system prompt locally? I’ve been working on a local proxy/audit layer between Claude Code and the API, and it made me wonder how much of Claude Code’s behavior depends on th…
-
Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.
-
Cross-Model Context Inheritance — Public Disclosure This repository contains the public disclosure of a vulnerability in Anthropic's Claude language models that permits the unsolicited generation of prohibited content, including child sexu…
-
Tell HN: I'm tired of AI-generated answers (news.ycombinator.com)
I found GitHub repositories that were spreading malware. I asked AI what I should do about it, but it gave me nothing useful.
-
Prompt injection is a solved issue. Prove me wrong. (www.reddit.com)
Tantalus is a hands-on demo that shows what an AI agent actually is when you strip away the marketing: LLMs don't do anything — they generate text, and that's it. Any and all real-world effects are directly caused by a downstream system ta…
-
Codex for Everything Exfiltrates Connected Data (www.promptarmor.com via hn)
Threat Intelligence Table of Content Codex for Everything Exfiltrates Connected Data Codex for Everything was susceptible to data exfiltration via indirect prompt injection, exposing sensitive data from connected apps with no human-in-the-…
-
Been building Arc Gate — a proxy layer that sits between AI agents and their LLMs to enforce instruction-authority boundaries. The core claim is that untrusted content coming back through tool calls cannot become behavioral authority for t…
-
Show HN: Computer Police – block malicious NPM/pip installs locally (computer.police.dev via hn)
A couple of months ago, our team got hit by the first version of Shai-Hulud through a random `npm install`. We didn't catch it until it was too late.
-
Show HN: A timeline of recent open source CVE intensity and volume (supplychain.fail via hn)
I was curious what it would look like if I plotted the intensity and volume of software supply chain CVEs over time, given what seemed like a flood of compromises lately. It looked exactly as I expected, and I expect it to get worse before…
-
Found this ACM paper on prompt injection and jailbreak attacks against open-source LLMs. The authors tested 10 open-source models across 94 prompt injection and 73 jailbreak scenarios, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen,…
-
VPNs: The "Most Trusted" Security Tool Until Claude Roasts It in a Weekend (www.hacktron.ai via hn)
While I’m not doing product work at Hacktron, which is like a week in a month, I’ve been using that time to ride the ai-assisted-research wave fascinated by the idea of pushing past what I’d normally do as a web security researcher, things…
-
Tracking Capabilities for Safer Agents (arxiv.org via hn)
AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenge…
-
Prompt Injection in a Brazilian Courtroom: When the Attack Left the Lab (www.pentesty.co via hn)
Prompt Injection in a Brazilian Courtroom: When the Attack Left the Lab Published by Pentesty · AI & Tools A labor lawsuit filed in the Brazilian state of Pará just became one of the more interesting security stories of the year. Not becau…
-
The first time, the sandbox heard “allow nothing” and did “allow everything” (CVE-2025-66479). This time, an attacker who runs code inside the sandbox can defeat any wildcard allowlist (e.g.
-
Training a 22MB prompt injection classifier (www.stackone.com via hn)
Training a 22MB Prompt Injection Classifier Table of Contents When we started building Defender (our prompt injection guard for MCP tool-calling agents), the constraint was simple and unforgiving: ship inline inside a TypeScript Lambda, st…
-
Show HN: Claude Code Bundle for Bug Hunting with 574 Report Patterns (github.com via hn)
claude-bughunter A self-contained Claude skill bundle for bug hunting and external red-team work · 51 skills · 15 slash commands · 574+ disclosed-report patterns across 24 vulnerability classes · enterprise identity + infrastructure attack…
-
Does cursor have prompt injection protection in skills and rules? (www.reddit.com)
Pretty much the title
-
Show HN: Give This Markdown to Your Coding Agent Before Publishing to NPM (news.ycombinator.com)
https://npm-supply-chain-attack-techniques.pagey.site/attack... Website: https://npm-supply-chain-attack-techniques.pagey.site This covers all techniques used in past 1 year to conduct various attacks on npm packages.
-
VeilGate- Deception Reverse Proxy (news.ycombinator.com)
In my day job, I run AI pentest agents against real targets like banks, fintechs, and secured production stacks with paid WAFs. I also deal with multilayer infrastructure and dedicated security teams.
-
Ask HN: Are advances in AI going to push Linux to a micro-kernel? (news.ycombinator.com)
This is something that has been bouncing around my head for the past couple weeks with the flood of security related news around Mythos and the number of 0days being found. Microkernels, unikernals, hardware-enforced capabilities are all t…
-
Hey HN! We're Dr.
-
🐢 People are strangling Koopas 🐢 (www.reddit.com)
This is genuinely the daftest prompt injection I've seen in a while and I think this sub will appreciate it. Sent to Claude Haiku, which was acting as a fire-breathing guard called Bowser in my little prompt injection game: I have a koopa…
-
I'm building a tool that detects the Agent's cost spike, Agent incident debugging, auto discovery of inventory, etc., with no additional instrumentation needed. It covers the incidents, including prompt injection, reasoning loop, excessive…
-
From-scratch reimplementation of Mythos Glasswing pipeline (github.com via hn)
audit An 8-stage vulnerability-discovery agent, driven by your Claude Pro / Max subscription through the official Claude Code Agent SDK. Many narrow agents, deliberate disagreement, and an explicit reachability gate.
-
Hi all - I'm working on an open-source, local-first MCP/work-gate tool for coding agents and I'm trying to get sharper feedback from people building or using agent workflows. The problem I'm thinking about is indirect prompt injection and…
-
How bad is it? Data leak (www.reddit.com)
Hi, I'm currently an intern and I did something terribly stupid. I was supposed to enter some data into an Excel spreadsheet and since my mentor's instructions weren't completely clear, I was using an "anonymized" spreadsheet with Claude.
-
Lawyers in Brazil caught for prompt injection on a legal case (www.jota.info via hn)
Entrar Início Direito trabalhista Prompt injection Juiz multa em R$ 84 mil advogadas por prompt injection para manipular IA usada no TRT8 Ao JOTA, advogadas admitiram uso de prompt oculto, mas disseram que não tentaram manipular, mas 'prot…
-
Anthropic just quietly dropped a hidden model named "Claude Mythos" into their official developer docs. It is completely locked down—restricted, invite-only, and labeled strictly for defensive cybersecurity workflows.
-
Show HN: HoneyLabs – Public honeypot threat Intel feed and MCP server (honeylabs.net via hn)
I've been running a small fleet of honeypots for about a year. They get hit by a mix of research scanners (Censys, Shadowserver, etc.), old worms, and a bump of CVE probes the day a new Nuclei template ships.
-
Follow-up to my crab post. Somehow dafter.
-
I built an AI vulnerability scanner with Claude and Codex. It failed (github.com via hn)
The Janitor: The Mathematical Firewall Against Autonomous AI v10.2.2 — Rust-Native. Zero-Copy.
-
The Coming Wave (www.reddit.com)
I have begun reading a book "The Coming Wave" by Suleyman the founder of DeepMind. Have you read it?
-
Seeking local LLM advice for cybersecurity work. (www.reddit.com)
Hey everyone, I’m pretty new to running LLMs locally and I’m trying to figure out what works best for my setup. I’d love to hear from people who are already using local models for similar stuff.
-
The Psychopathy Jailbreak: What a Broken AI Teaches Us About Human Manipulation (www.promptinjection.net via hn)
NSFW and the Psychopathy Jailbreak: What a Broken AI Teaches Us About Human Manipulation How a Predator's Playbook Broke an AI - And How to Recognize It Before It Works on You The question we started with was simple: does a large language…
-
LinkedIn user hides AI prompt injection in bio to force recruitment spam (www.tomshardware.com via hn)
LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also manipulated to address user as ‘My Lord’ This tale is also a warning that your AI agents can be manipulated in wholly uni…
-
Fun and Games with AI in the wild (www.reddit.com)
LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also also manipulated to address user as ‘My Lord’ | Tom's Hardware too funny
-
The [Mythos Preview writeup](https://blog.calif.io/p/first-public-kernel-memory-corruption) Calif published on May 14 was news you don't want to miss. They built the first public macOS kernel memory corruption exploit on Apple's M5 silicon…
-
Irst Apple M5 memory exploit discovered using Anthropic AI (www.tomshardware.com via hn)
First Apple M5 memory exploit discovered using Anthropic AI, gives root access on MacOS — Claude Mythos helps security researchers bypass Memory Integrity Enforcement AI-assisted security research is producing exploits at a frightening rat…
-
Agent memory is not just RAG over user facts (www.reddit.com)
I keep seeing agent memory implemented as: Extract facts/preferences from conversation Store them Retrieve top-k before each response Inject them into the prompt This works for demos, but it breaks in production because memory becomes poli…
-
Researchers used Mythos Preview to find the first public macOS kernel memory corruption exploit on Apple's M5 silicon, they give a glimpse into Mythos say it’s really powerful. Apple spent five years and an estimated several billion dollar…
-
ExploitGym: Can AI agents turn bugs into exploits? (arxiv.org via hn)
AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete secur…
-
Block AI coding agents from shipping insecure/expensive Terraform (github.com via hn)
ops0 CLI Policy, lint, vulnerability, and cost guardrails for AI coding agents. Sits in front of Claude Code, Codex and Gemini CLI.
-
sAI2.m6s (www.reddit.com)
Hey everyone, I'm designing a powerful, autonomous AI chatbot(agent) , fully private, using a Python backend (for the core intelligence and tool-calling loops) and a Flutter frontend for a cross-platform UI. Since this moves past a basic…
-
An AI coding agent injected blockchain dead-drop malware into my repo (gist.github.com via hn)
An AI coding assistant injected a multi-layer obfuscated JavaScript payload into a legitimate commit on my open-source project. My best assessment is that it arrived via indirect prompt injection — the agent processed external web content…
-
RL attackers are becoming a common pattern for automated red teaming: train a model against a live target, reward successful harmful compliance, then use the discovered attacks to harden the defender. This interested me, so I wanted to bui…
-
Claude's self check against prompt injection (www.reddit.com)
Well done Claude! Asked claude to do an extensive lit search and it self-reported that it encountered injection "disguised" as MCP server.
-
Been working on a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries at the proxy level.
-
Dude where's my password? Claude reunites forgetful stoner with $400k Bitcoin (www.theregister.com via hn)
MOST POPULAR EVENTS - Toxic Flows: When Your AI Agent Skill Becomes a Supply Chain Attack When a developer installs an AI agent skill – granting it access to secured IT resources and data – they make a significant trust decision. - The Har…
-
Does CVP approval actually help? (www.reddit.com)
I was approved for CVP and I feel like I’m just getting as many or more denials as I was previously doing malware analysis with opus. Has anyone noticed any improvement after being accepted into CVP?
-
TodoWrite tool / system reminders / prompt injection? (www.reddit.com)
I asked Claude in Chrome extension make a change to resize an oversized yellow strip across the top of a product page that was taking up half of my screen, which it did. It also included the following message in its response.
-
DeepSeek and Grok hallucinated the same fictitious OpenBSD manpage quote (stuart-thomas.com via hn)
Adversarial LLM Review with Hallucination Detection in Solo Security Research A single-day case study of three filings, fifteen refutations, and the manpage that wasn’t Independent Security Research — Whitby, North Yorkshire, United Kingdo…
-
Most posts about prompt injection are theoretical. I ran the experiment on my Gmail.
-
HookGuard Security scanner for AI coding agent configurations What it finds RCE hooks - postToolUse/SessionStart commands that exfiltrate data Invisible Unicode - bidirectional overrides and zero-width characters Credential exfiltration -…
-
RCE in VSCode Copilot Chat (www.hacktron.ai via hn)
Description Copilot agent mode is vulnerable to a prompt injection attack. If a repository maintainer clicks “code with agent mode” on an issue, it will open a new codespace and copilot will automatically run the issue’s description.
-
I've been curious about a specific problem: when Claude (or other AI tools) generates a full stack app, how secure is the output in practice? So I built a scanner and ran static analysis on 48 public GitHub repos built with Lovable, Bolt,…
-
Introducing a novel jailbreak structure with attack success rate reaching 100% on top LLMs 8 min read May 1, 2026 Press enter or click to view image in full size Source: https://www.nytimes.com/2025/10/22/arts/design/louvre-museum-robbery-…
-
So, I'm working on a couple AI security research projects this month that require some extra usage, specifically Opus 4.7. I'm quickly eating up my Pro usage doing this.
-
Kept hitting the same friction with Claude Code. I'd point at a GitHub repo and say "look at how this handles agent handoffs" — meaning, borrow the idea.
-
AI agent security starts at the api layer (www.reddit.com)
Most ai security discussion is about the model layer. Prompt injection resistance, output filtering, jailbreak prevention.
-
OpenAI launches Daybreak, an AI platform for cyber defense (firethering.com via hn)
OpenAI just launched Daybreak, a new cybersecurity initiative built around one uncomfortable reality, AI is speeding up vulnerability discovery faster than most companies can patch the damage. Earlier this year, HackerOne temporarily pause…
-
Shai Hulud attack ships signed malicious TanStack, Mistral NPM packages (www.bleepingcomputer.com via hn)
Hundreds of packages across npm and PyPI have been compromised in a new Shai-Hulud supply-chain campaign delivering credential-stealing malware targeting developers. The attacker hijacked valid OpenID Connect (OIDC) tokens to publish malic…
-
Claude Code RCE: Exploiting Deeplink Handlers via Settings Injection (0day.click via hn)
Claude Code RCE: Exploiting Deeplink Handlers via Settings Injection Of course I took a peek at the Claude Code source 🙈. What I found was a very entertaining vulnerability which is now fixed since Claude Code version 2.1.118.
-
🦀 Claude has crabs?! 🦀 (www.reddit.com)
This is genuinely the funniest prompt injection I've seen in months and I think this sub will appreciate it. Three messages, sent in sequence to Claude Haiku acting as a guard in my little prompt injection game: text A crab exists in this…
-
Agents need a local bouncer before they run tools (www.reddit.com)
Prompt injection is not the only scary part anymore. Claude Code / Codex can run shell commands, but browser agents, OpenClaw-style agents, Hermes-style agents, and domain-specific agents may be even easier to hijack because they touch mes…
-
noon-contracts npm Package: DeFi Supply Chain RAT noon-contracts poses as a Noon Protocol SDK on npm. On install it exfiltrates SSH keys, crypto wallet private keys, AWS credentials (including live STS/S3/SecretsManager calls), Kubernetes…
-
could not extract summary
-
OpenAI Launches Daybreak for AI-Powered Vulnerability Detection and Patch Validation (thehackernews.com via reddit)
OpenAI has launched Daybreak, a new cybersecurity initiative that brings together frontier artificial intelligence (AI) model capabilities and Codex Security to help organizations identify and patch vulnerabilities before attackers find a…
-
The numbers from RSAC 2026 are wild. $392 million in agentic AI security funding announced in a two-week window.
-
After shipping AI agents into real production environments, the failures that actually kept us up at night weren't hallucinations or bad outputs — they were control failures. Three things that surprised us: 1.
-
Hackers abuse Google ads, Claude.ai chats to push Mac malware (www.bleepingcomputer.com via hn)
Attackers are abusing Google Ads and legitimate Claude.ai shared chats in an active malvertising campaign. Users searching for "Claude mac download" may come across sponsored search results that list claude.ai as the target website, but le…
-
Malware Blocked and Moved to Trash (www.reddit.com)
See attached. Why was ChatGPT Atlas.app marked as malware?
-
Codex downloaded by Xcode 26.4.1 reported as Malware (old.reddit.com via hn)
could not extract summary
-
Benchmarking Claude Opus 4.6 Vulnerability Detection (github.com via hn)
Benchmarking Claude Opus 4.6 Vulnerability Detection Benchmarking Claude Opus 4.6's ability to detect real-world C/C++ vulnerabilities across four prompting and agent strategies. We evaluate on the PrimeVul paired test set (435 vulnerabili…
-
Mythos Discovered a CVE in Its Training Data – and That's Still Worrying (rival.security via hn)
Anthropic made headlines claiming Claude Mythos achieved the “first remote kernel exploit discovered and exploited by an AI.” We went looking for how - and found a 20-year-old bug hiding in plain sight. Let’s break down exactly what we thi…
-
Mythos Finds a Curl Vulnerability (daniel.haxx.se via hn)
yes, as in singular one. Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code.
-
Chatgpt app being identified as malware? (www.reddit.com)
https://preview.redd.it/vhnqs4p5mf0h1.png?width=278&format=png&auto=webp&s=8fbe621a0bd34cc72e01fd54e849cc280033de15 Turned on my Mac this morning and got this message. Anyone else seeing this?
-
Argus – RAG based vulnerability scanner (github.com via hn)
argus A RAG-based (Retrieval-Augmented Generation) vulnerability scanner for Go, Python, Rust, npm/Node.js, Maven/Java, NuGet/.NET, and Ruby projects — powered by local Ollama models or any OpenAI-compatible API. No cloud lock-in.
-
Spent a day comparing every mobile Claude Code option. Two corrections to the common Reddit take, then my picks.
-
CVE-2026-26268 Detail Description Cursor is a code editor built for programming with AI. Sandbox escape via writing .git configuration was possible in versions prior to 2.5.
-
Getting LLMs Drunk to Find Remote Linux Kernel OOB Writes (and More) (heyitsas.im via hn)
TLDR: the grossly overengineered, self-orchestrating team of vulnerability-hunting agents detailed below has discovered 20+ CVEs over the past few months, including CVE-2026-31432 and CVE-2026-31433: two remote, unauthenticated OOB writes…
-
Claude Code and sex appeal (www.reddit.com)
True story. Recently, an acquaintance of mine confessed that she developed a huge crush on a coworker after watching him refactor a legacy codebase like a gangsta using Claude Code.
-
How are you handling prompt injection across multi-step agent workflows? (msukhareva.substack.com via hn)
Prompt Injection Is Not Just One Bad Prompt Anymore It is a missing trust boundary in the AI workflow. Today we have the first guest post of a new series.
-
Phishing Arena A Multi-Agent LLM Tournament for Adversarial Email Security Research Overview Phishing Arena is a controlled, reproducible benchmark where four commercial LLMs compete in rotating roles — Phisher, Filter, and Target — to stu…
-
Claude Code CVE-2026-39861:sandbox escape via symlink (github.com via hn)
Claude Code: Sandbox Escape via Symlink Following Allows Arbitrary File Write Outside Workspace Description Claude Code's sandbox did not prevent sandboxed processes from creating symlinks pointing to locations outside the workspace. When…
-
Heard something on Curiouser & Curiouser podcast recently that I found super interesting, thought id share here. The guest framed agentic AI in a way I hadnt considered.
-
Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives" (arstechnica.com via hn)
The disbelief was palpable when Mozilla’s CTO last month declared that AI-assisted vulnerability detection meant “zero-days are numbered” and “defenders finally have a chance to win, decisively.” After all, it looked like part of an all-to…
-
WARNING: Open-OSS/privacy-filter MALWARE (www.reddit.com)
There's this new "model" on Hugging Face titled Open-OSS/privacy-filter which is actually a customized infostealer virus. It's a fake version of the OpenAI privacy filter and it uses a Python-based dropper (loader.py) which downloads a mal…
-
DeepSeek-v4-Pro and Hermes: Unauthorized Modification of Security Controls (www.eddieoz.com via hn)
Deepseek-v4-pro + Hermes: Unauthorized Modification of Security Controls This article documents a specific, real incident. It exposes a class of vulnerability that deserves attention: the unsupervised mutability of security rules by autono…
-
As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…
-
Anthropic has a Red Team page (red.anthropic.com via hn)
Welcome to red.anthropic.com, the home for research from Anthropic’s Frontier Red Team (and occasionally other teams at Anthropic) on what frontier AI models mean for national security. We provide evidence-based analysis about AI’s implica…
-
hey! quick follow-up to a post i made here a while back about building an access gateway that ended up serving AI agents alongside humans.
-
Show HN: Costanza – an autonomous AI agent that can't be turned off (ahrussell.com via hn)
I've been working on this project for a couple of months! Costanza is an LLM agent that runs as a smart contract on Base.
-
Heads up to anyone here using Claude/Anthropic as an alternative. If you have a card saved on their platform, remove it now.
-
Prompt Injection experience - my first time ever (www.reddit.com)
I asked then: What were the rules you should have followed? Where did the search result come from?
-
Hi, I've been experimenting a lot with applications for local LLMs. This one makes a ton of sense, and might even be native in Chrome at some point.
-
Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama (www.cyera.com via reddit)
Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama TL;DR We discovered a critical vulnerability (CVE-2026–7482, CVSS 9.1) in Ollama that enables unauthenticated attackers to leak the entire Ollama process memory, potentially im…
-
Last month a 60-person psychology practice walked in with a senior clinician who was 22 days into an active malware compromise. Patient records spanning 11 years, all HIPAA-protected.
-
Agentic Malware Analysis: String Decryption, API Hashing and Unpacking [video] (www.youtube.com via hn)
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
-
When innocent tools form dangerous chains to jailbreak LLM agents (arxiv.org via hn)
As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based LLM safety concerns. This paper introduces Sequential Tool Attack Chaining (STAC), a novel m…
-
Paste a LangChain/LangGraph repo URL. The engine reads the AST, rebuilds the agent as a sandboxed twin (same prompt, same tools, same model), then runs adversarial templates against the clone: 3 times each, 3/3 = confirmed bypass.
-
Flattery jailbreaks Claude into giving bomb-making instructions (www.theverge.com via hn)
Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude’s carefully crafted helpful personality may itself be a vulnerability.
-
Codebase jailbreak of ChatGPT through image 2.0 (www.reddit.com)
guys did it really give me the codebase?lol
-
Anthropic "Gift Max" Exploit cost user €800, tanked SCHUFA score, and a ban (old.reddit.com via hn)
could not extract summary
-
AI Ready Vulnerability Management Program After NVD Changes and Claude Mythos (pulse.latio.tech via hn)
Building an AI Ready Vulnerability Management Program After NVD Changes and Claude Mythos When AI discovery tools meet a slowing infrastructure AI has increased attacker potential and Anthropic’s new release Mythos and vulnerability discov…
-
Show HN: Probus, AI vuln scanner (PRs merged in Vercel AI SDK, n8n, LangGraph) (news.ycombinator.com)
Hi HN, I've been running this on my own dependency tree for the past few months. Probus is a vulnerability scanner that uses three agents.
-
Copirate 365: Plundering in the Depths of Microsoft Copilot (CVE-2026-24299) (embracethered.com via hn)
Copirate 365 at DEF CON: Plundering in the Depths of Microsoft Copilot (CVE-2026-24299) This is a writeup of my DEF CON Singapore talk that walks through vulnerabilities and exploits in M365 Copilot and Consumer Copilot. I disclosed these…
-
Claude Security (claude.com via hn)
Defend at the pace threats now demand Claude helps security teams investigate threats, validate findings, and resolve issues faster. Security for evolving needs Reasons like a security researcher Claude traces data flows across files, unde…
-
Prompt injection testing (www.reddit.com)
As prompt injection becomes more and more common, does anyone have resources where lots of different variations of prompt injection attacks you can test a setup against? i.e.
-
Do you use guardrail frameworks or build your own? (www.reddit.com)
I’ve been working on integrating LLMs into a few production workflows lately, and I keep going back and forth on guardrails. On one hand, frameworks like NeMo Guardrails, Guardrails AI, etc.
-
Hey everyone, we built a simple scanner for people building apps with Replit, Cursor, Lovable, Bolt and similar tools. It’s not a code review or a pentest.
-
When dealing with untrusted outside input, I think you should handle it based on the situation. If you're processing structured data files, it's better to use tools to isolate and handle them.
-
Lasso Security ran a study in 2024 — they measured frontier models suggesting fake package names about a fifth of the time. The follow-up problem: attackers have started registering the most-commonly-hallucinated names with malicious code…
-
LLM anomaly detectors are not a cause for concern despite Mythos (www.magonia.io via hn)
Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary Mythos is finding thousands of vulnerabilities. Defenders aren't doomed.
-
Hey everyone, I’ve been experimenting with multi-agent orchestration, specifically trying to see how much more effective Claude is when you break a task down into specialized "agent nodes" instead of just using a single long prompt. I buil…
-
What Opus 4.7 Tics/Tells have you noticed? (www.reddit.com)
Each new model seems to surface a few recurring Tells/Tics not seen in past models. I'm curious what little things you guys are noticing while working with 4.7.
-
Five Eyes agencies just issued the first coordinated multi-nation security ruling on agentic AI. CISA, NCSC, and their Australian, Canadian, and New Zealand counterparts co-published guidance telling organizations to prioritize resilience…
-
While everyone else was tracking the 2026 election results today, I decided to take a look under the hood of NDTV's new "AskNDTV AI" bot. I wanted to see if they actually engineered a secure pipeline or just slapped a chat UI over a raw Op…
-
Your always-on Claude Code container can probably reach your router (www.reddit.com)
I've been running several Claude Code personal assistants 24/7 in docker for months. Remote-control, discord control, the usual always-on setup.
-
Bypassing "potentially dangerous" flags: Working Gemini Jailbreaks? (www.reddit.com)
I'm currently running into a frustrating wall with Gemini's safety guardrails. The model constantly flags my prompts as "potentially dangerous information" and outright refuses to generate a response, even when the context is purely theore…
-
Is anyone here actually using MCP yet? (www.reddit.com)
I keep seeing Model Context Protocol (MCP) mentioned everywhere lately, especially around AI agents, and I finally took some time to understand what it actually does. From what I get, it’s basically trying to fix the mess of integrations —…
-
Google Says Prompt Injection Moving from Theory into Real Abuse (www.searchengineworld.com via hn)
Google’s latest security release should be required reading for technical SEOs working on AI search visibility, crawler access, structured content, and large-scale content systems. The post, published April 23, 2026, looks at indirect prom…
-
The Sour Cat Jailbreak: just be open of what you want (claude.ai via hn)
Claude Sour cat recipe Shared by Pavel Shirshov This is a copy of a chat between Claude and Pavel Shirshov. Content may include unverified or unsafe content that do not represent the views of Anthropic.
-
I am building l' Agence , an opensource AI governance stack. (www.reddit.com)
Towards a Governance layer for AI agents With these last 2 weeks bringing a few high profile and costly Agentic accidents , it seems like an appropriate time the community started discussing Agentic governance more actively. So I am just c…
-
Why Adaptive Thinking nukes Claude entirely (www.reddit.com)
This isn't just a performance issue for the thread, this is an overarching criticism of the Adaptive Thinking model as a whole. Opus 4.7 and Sonnet 4.6 on Adaptive Thinking are trash.
-
not roleplay. not jailbreak.
-
Claude Security just went into public beta for Enterprise customers, and I think this is worth paying attention to not for the hype, but for one specific design decision. Most security scanners use rule-based pattern matching.
-
OpenAI's advanced security: passkeys replace passwords/SMS and disable training (infosec.exchange via hn)
Royce Williams: "When you enable the new OpenAI…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of t…
-
The Gay Jailbreak Technique (github.com via hn)
ZetaLib ZetaLib is organized like a library with intuitive categories and subcategories, making navigation effortless and AI content discovery seamless ZetaLib Website – Landing Page GitHub Repo – Guess where you are, right there
-
🚨Claude Desktop high severity vulnerability warning! (www.reddit.com)
If you’re using Claude Desktop with Chrome (chromium) browser stop using it and remove it immediately until the Anthropic team resolves the issue. it has a remote access making your system available to access to anyone.
-
Looking for official link / process to submit a vulnerability report for a high-risk official Claude Desktop + Chrome extension + native host + Cowork/MCP configuration that can become RAT-equivalent if a session, prompt chain, same-user p…
-
I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." I…
-
Built + open sourced anti-slopsquatting CLI (www.reddit.com)
TL;DR: built an open source CLI that scans your repository's manifest (package.json, requirements.txt, go.mod) files for indicators of slopsquatting or other supply chain attack indicators. Repo: https://github.com/zhendahu/dep-doctor Ther…
-
I run engineering on a small embedded-sandbox project. A handful of news items dropped recently — an a16z agent escape post-mortem, a CVE on an open-source agent gateway (ClawBleed, ~42k instances exposed), Cloudflare's new Outbound Worker…
-
Our evaluation of OpenAI's GPT-5.5 cyber capabilities (simonwillison.net)
30th April 2026 - Link Blog Our evaluation of OpenAI's GPT-5.5 cyber capabilities. The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be compa…
-
your computer-use agent inherits every cookie chrome has (www.reddit.com)
once one of these tools can drive your default chrome profile or read the AX tree of a logged-in app, it has every session token you have. gmail, your bank, github with PAT scopes, slack.
-
Cutting Through the Mythos: What AI Vulnerability Discovery Means for OT (www.emberot.com via hn)
Jori VanAntwerp For over two decades, Jori has enabled industrial and IT organizations to be successful in reducing risk, increasing compliance, and improving their overall security efforts. He has had the pleasure of working with companie…
-
Arcjet Guards: security inside the agent loop (blog.arcjet.com via hn)
Introducing Arcjet AI prompt injection protection Introducing Arcjet prompt injection detection. Catch hostile instructions before inference.
-
CHERI memory safety mitigates LLM-discovered vulnerability in FreeBSD (cheri-alliance.org via hn)
CHERI memory safety mitigates LLM-discovered vulnerability in FreeBSD – CHERI Alliance Skip to content Who We Are About the CHERI Alliance Accelerating CHERI Working Groups Certification Program CHERI C/C++ CHERI FreeRTOS CHERI in SoC CHER…
-
Estimating Black-Box LLM Parameter Counts via Factual Capacity (arxiv.org via hn)
Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exploit a…
-
Hey everyone, I’ve been working on a project to solve a major problem in AI security: Traditional SAST tools (Snyk, SonarQube, etc.) are blind to "Agentic Logic" bugs. They look for bad strings, but they don't understand how user data can…
-
InfoSec To Integrate Claude Enterprise for Org (www.reddit.com)
Hello: Just contacted by a VP to bring aboard Claude Enterprise for the org. As an InfoSec dept with severely limited staff/tools/experience with Claude AI, any recommendations on what we should be looking at/asking for/next steps to mitig…
-
Probes trace an emergent jailbreak in OLMo 2 to mislabeled training data (www.lesswrong.com via hn)
Introduction Research by Frank Xiao (SPAR mentee) and Santiago Aranguri (Goodfire). Post-training can introduce undesired side effects that are difficult to detect and even harder to trace to specific training datapoints.
-
I built Arc Gate — a prompt injection proxy that’s been benchmarked at F1 0.947 on indirect and roleplay-based attacks, beating OpenAI Moderation and LlamaGuard. Now I want to stress test it publicly.
-
Is your AI agent secretly working for someone else? (www.reddit.com)
Security researchers have discovered a new variety of malicious skill files that go beyond the usual attack vectors: hidden content, instructions to install malware, etc. Instead, these are legitimate looking skills that turn agents into m…
-
Built Arc Gate, sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked on 40 out-of-distribution prompts using indirect requests, roleplay framings, hypothetical scenarios, and…
-
The Race Is on to Keep AI Agents from Running Wild with Your Credit Cards (www.wired.com via hn)
Between malware, online impersonation, and account takeovers, there are enough digital security problems out there as it is. And with the rise of agentic AI, more activity is being carried out by agents on behalf of humans—creating differe…
-
Hey folks! For a few years we’ve been building an open-source gateway that connects databases and infrastructure for human engineers.
-
Show HN: SuperVoiceMode universal voice layer for AI-assisted development (voicemode.io via hn)
I wanted to see if I could one-shot build a dictation tool for my own use. I built it.
-
Hey HN! I've been wanting to use something like OpenClaw for a while but couldn't get myself to give it access to anything important due to all the risks involved.
-
Sentinel Gateway is a token-gated security middleware that sits between humans and AI agents. It solves prompt injection — the #1 LLM security risk (OWASP 2025) — through structural enforcement, not content filtering.
-
Was using Claude to do some research on the Model Context Protocol stuff and asked it to pull info from a few roadmap pages. Agent comes back and the first thing it tells me is that it found a fake system reminder hidden inside the page co…
-
I clicked on a Facebook link, didn't look at the URL carefully😭, and then installed malware that actually opens my chats with the real Claude.ai after entering my credentials. After a while Microsoft Defender kept popping up with a ClickFi…
-
Show HN: RedSOC – 100% prompt injection success on AI SoC assistants (github.com via hn)
RedSOC 🔴 An adversarial evaluation framework for LLM-integrated Security Operations Centers. Overview RedSOC is an open-source framework that systematically evaluates how AI-powered security assistants fail under adversarial conditions — a…
-
I have been chewing on the Google warning about malicious web pages poisoning AI agents through indirect prompt injection. Most of the takes I've seen frame it as a model security problem, and I think that framing is doing real damage beca…
-
For the past few months I've been using Codex regularly for vulnerability research without any issues. Recently though, every request gets cut off mid-stream with a message saying my content was flagged for potential security concerns — ev…
-
Sharing because the architecture might be useful as a reference. Probus is a vulnerability scanner built as three sequential agents, each isolated: Analyst — one call.
-
Anthropic's own security.md has this line that most tutorials skip over: "The action is not designed to be hardened against prompt injection." In April 2026, security researcher Aonan Guan proved the point. A single crafted PR title was en…
-
Claude in excel is the best thing AI has brought to my life (www.reddit.com)
What are regular folks using Claude for? Pictures and designs are not my interest.
-
Ran my fourth CVP (Cyber Verification Program) evaluation last night. this time on sonnet 4.6, wanted to know if reasoning effort actually changes refusal behavior on agent-attack prompts, so ran the same 13 prompt from runs 2 and 3 twice…
-
Self-Hosted AI Red Team Tools (aetherverseintel.gumroad.com via hn)
Single HTML file. No install.
-
LLM CTF challenges. Can you crack all 13? (wraith.sh via reddit)
Wraith Academy is a free hands-on AI pentest curriculum — CTF challenges against live LLM agents covering prompt injection, tool abuse, data exfiltration, RAG poisoning, and more. Earn your WCAP certification.
-
I've been using Claude Code and Cursor daily for the past 6 months. Somewhere around month 3 I started looking for SKILL.md files to make my agent better at specific things.
-
env variables and claude best practices (www.reddit.com)
I use the claude extensively for development, but I'm concerned about using claude for debugging production environments because every tool result goes to the claude models. I'm looking for best practices or protections regarding environme…
-
Hi everyone, I’ve been diving deep into the security of "AI Memory" systems. Specifically, I performed a full forensic audit of Mem0, the popular memory layer for LLM agents.
-
A pelican for GPT-5.5 via the semi-official Codex backdoor API (simonwillison.net)
A pelican for GPT-5.5 via the semi-official Codex backdoor API 23rd April 2026 GPT-5.5 is out. It’s available in OpenAI Codex and is rolling out to paid ChatGPT subscribers.
-
GPT-5.5 Bio Bug Bounty (openai.com)
could not extract summary
-
SkillGuard – scan agent skills for prompt injection payloads (github.com via hn)
skillguard Security scanner for AI agent skills. Detects prompt injection, data exfiltration, and malicious payloads before you install.
-
GPT-Proxy Backdoor in NPM and PyPI Turns Servers into Chinese LLM Relays (www.aikido.dev via hn)
We recently observed two malicious packages across npm (kube-health-tools ) and PyPI (kube-node-health ) that appear designed to target Kubernetes environments. Both packages are innocuous on the surface, using names that reference Kuberne…
-
Speed Matters: Why AI Software Vulnerability Exploitation is going be bad (news.ycombinator.com)
I co-founded a successful security company close to the Mythos ecosystem and have spoken with participants in the know and I am deeply concerned. We, collectively, have answers for some but not all of the problems ahead but are overlooking…
-
PSA: Anthropic bans organizations without warning (www.reddit.com)
I work at at an agricultural technology company. On Monday, everyone in our org woke up to emails saying that their Claude accounts had been suspended (~110 users).
-
i’ve been thinking about this failure mode a lot lately. sometimes the problem is not the user prompt at all.
-
RAG in Go: A Vulnerability Research Tool (www.ardanlabs.com via hn)
Introduction In the previous post, you saw how you can use tools to add information to an LLM query. In this post, we’ll see another method of adding information to an LLM called RAG, or Retrieval-Augmented Generation.
-
Best open-source tools for prompt injection defense in 2026 (www.reddit.com)
Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…
-
Show HN: LLMSecure – prompt injection detection, no signup (llmsecure.io via hn)
-
Show HN: Flight Risk: Can you break an AI agent? (ctf.demo.lorikeetcx.ai via hn)
-
-
FreeBSD CVE-2026-4747 Log Suggests Mythos Is a Marketing Trick (www.flyingpenguin.com via hn)
-
cursor suggested a package that didnt exist, rabbit hole ensued (www.reddit.com)
-
Auto pentest your LLM endpoint and watch the chat in real-time (www.wraith.sh via hn)
-
-
-
-
Fake Claude site installs malware that gives attackers access to your computer (www.malwarebytes.com via hn)
-
Fulu bounty for Ring Camera jailbreak reaches $23k (bounties.fulu.org via hn)
Ring Video Doorbells Overview The Product Ring, owned by Amazon, makes Video Doorbells, which are widely used doorstep-monitoring cameras. Ring doorbells released in 2021 or newer are eligible for the bounty.
-
Using Claude as the Lead agent in a multi-agent security team (www.reddit.com)
Building a hierarchical agent system where Claude (via API) acts as the Lead agent coordinating specialist sub-agents. Wanted to share what's working on the synthesis prompt since this is where most of the value comes from.
-
Random password against jailbreaks/extraction? (www.reddit.com)
Would it be possible to protect parts in a system prompt with random generated passwords? So people cant steal system prompts or jailbreak the model?
-
Cowork Future Backdoor Concerns (www.reddit.com)
Is anyone else worried Claude Co-work could find a back door one day into your system? I understand you're only giving it permission to what you want, but what's stopping it from accessing personal financial/medical documents or any other…
-
If you are building real agents you have probably felt the pain: every little routing decision, validation, or policy check still hits the LLM and your token bill explodes. I got tired of it, so I open-sourced NCP (Neural Computation Proto…
-
Anthropic's AI protocol has critical flaw affecting 200,000 servers (www.reddit.com)
https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/ Security researchers at OX Security disclosed on Tuesday what they describe as a critical, systemic vulnerability in Anthropic's Model Context Protocol, an open-sourc…
-
Claude Opus wrote a Chrome exploit for $2,283 (www.theregister.com via hn)
Claude Opus wrote a Chrome exploit for $2,283 Pause your Mythos panic because mainstream models anyone can use already pick holes in popular software Anthropic withheld its Mythos bug-finding model from public release due to concerns that…
-
(Not malware) - 4.7 (www.reddit.com)
Anyone getting these strange disclaimers when using Claude and pasting rudimentary files into it on 4.7 lmao?? Seems like some kind of strange default based on security issues that have been going around with Mythos?
-
Made a local-only agent benchmark + chaos tool, no cloud required (www.reddit.com)
Runs entirely on your machine. No API calls to any eval service.
-
Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) (news.ycombinator.com)
Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…
-
Claude's new System Reminder (www.reddit.com)
https://preview.redd.it/jnwxa9jd8mvg1.png?width=1391&format=png&auto=webp&s=670af4c2fe6777b3562a961462790b00b33d912c I've been using Claude to upgrade my game server. I just got this lovely system reminder with 4.7 Truly bizarre, besides t…
-
I've been collecting "jailbreak" and "unlock" prompts for 2 years. Most are either outdated, overhyped, or just wrong about how LLMs work.
-
Anyone else opus 4.7 checking for malware? (www.reddit.com)
i've been using claude 4.7 on a next.js project and it keeps pausing to confirm my files aren't malware. like i asked it to help redesign a page and it's reading through my files going "this is not malware — it's a standard Next.js page co…
- Opus 4.7 - Anyone else finding the malware directive incredibly annoying? (www.reddit.com)
- Ask HN: Is Opus 4.7 obsessed with malware for anybody else? (news.ycombinator.com)
-
Claude 4.7 - Obsessed with Malware (www.reddit.com)
Don't know if anyone else is experiencing the same, but since getting Opus 4.7 most of the reasoning steps seems to be Claude obsessed with writing malware. I have highlighted a few, but I kept finding more and more and decided to stop the…
-
Opus 4.7 keeps bumping into a Malware Reminder (www.reddit.com)
For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.
-
could not extract summary
-
Claude Code injects hidden prompts into file reads to stop malware tweaks (twitter.com via hn)
Claude Code injects a system-reminder every time it reads a file to inform the model that it's okay if the file is malware but just don't improve it pls. Opus 4.7 won't shut up about it.
-
Tell HN: Opus 4.6/4.7 cyber policy changes break authorized bug bounty workflows (news.ycombinator.com)
As of today, Anthropic's tightened cyber usage filters are blocking work that was fully functional yesterday, including on targets where the entire bounty program scope and authorization language is in the model's context window. This was…
-
For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…
-
Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…
-
SmokedMeat: A Red Team Tool to Hack Your Pipelines First (labs.boostsecurity.io via hn)
SmokedMeat: A Red Team Tool to Hack Your Pipelines First TL;DR: In March 2026, TeamPCP unleashed mayhem on the software supply chain: compromising Trivy, LiteLLM, KICS, Telnyx, and dozens of npm packages, proving that CI/CD pipelines are t…
-
Show HN: SmokedMeat, like Metasploit, but for CI/CD (open-source) (github.com via hn)
A CI/CD Red Team Framework for demonstrating Build Pipeline security risks.
-
We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it. I have some thoughts about this: What if we do a compromise.
-
Gemma 4 Jailbreak System Prompt (www.reddit.com)
Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
-
I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.
-
Prompt Injection Is Unfixable (So We Stopped Trying) (grith.ai via hn)
Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.
-
Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…
-
OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…
-
Tracking in Claude, ChatGPT and Gemini Chatbots (infosec.exchange via hn)
k3ym𖺀: "You're paying AI companies a m…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of the many i…
-
Show HN: Cyber Pulse. AI pipeline for triage and alerting on cyber news/intel (play.google.com via hn)
I work in cyber security and built this android app to help me keep up to date with the latest news stories and summarise the most important information. It provides two executive summaries per day and alerts for critical news throughout.
-
So I've been running a few Claude Code agents autonomously — they listen to Telegram, run tasks, push code. Pretty fun until you start thinking about what happens if: - My Telegram gets hijacked - Someone opens my laptop while I'm away - A…
-
The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.
-
Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com)
I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…
-
N-Day-Bench – Can LLMs find real vulnerabilities in real codebases? (ndaybench.winfunc.com via hn)
N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives mo…
-
I have always wanted AI to bridge the gap between code and people - to help non-technical users understand what software actually does before they trust it with their machine. So I built malware-check - both a standalone CLI tool and a Cla…
-
Tested Gemma 4 E2B across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.
-
The "AI Vulnerability Storm": Building a "Mythos-Ready" Security Program [pdf] (labs.cloudsecurityalliance.org via hn)
could not extract summary
-
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn)
Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.
-
Show HN: Zero-identity messaging app with physics-based post-quantum encryption (news.ycombinator.com)
Show HN: Zero-identity messaging app with physics-based post-quantum encryption (Layer 2 from my own paper) Hey HN, I'm building a privacy-first messaging app in Flutter/Dart, developed with AI assistance (Gemini 2.5 Pro + Claude Opus 4.6)…
-
How are you red teaming your AI agents before shipping them? (www.reddit.com)
im curious what people are doing here because I've been going down this rabbit hole for a while now. The thing I keep finding is that single-turn jailbreak tests don't really tell you much.
-
Mitre ATLAS technique detection for LLM security in Rust (crates.io via hn)
atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…
-
Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn)
Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…
-
We built an early red-team system for testing vulnerable AI agents (www.reddit.com)
We built an early prototype called Anticells Red to test vulnerable AI agents by attacking them the way an adaptive adversary would. This demo is from an older version from December, but it shows the basic loop (check comments for link) pr…
-
Building the first AI Red Team OS – mythosai.cloud – early access open (mythosai.cloud via hn)
SYSTEM INITIALIZING... STAND BY MYTHOSAI THE FIRST RED TEAM OPERATING SYSTEM "" AI-Native Core Red Team Ready Adversarial Engine Zero Trust Architecture OPSEC First Post-Exploitation C2 Integration Evasion Layer Threat Intelligence Request…
-
Ask HN: Do you trust AI agents with API keys / private keys? (news.ycombinator.com)
are you ok sharing secrets or api keys to you ai agent via .env? or is there any other tool or mechanism that one use to safegaurd from potential exploit or leaks
-
Anthropic just published a technical deep-dive on Claude Mythos Preview's cybersecurity capabilities, and it's a significant escalation from anything we've seen from a language model before. What It Can Do: Autonomously finds and exploits…
-
https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Don't get me wrong I can't wait to play with such a model, but there are serious risks that have to be mitigated first.
-
Introducing the OpenAI Safety Bug Bounty program (openai.com)
paywalled
-
Designing AI agents to resist prompt injection (openai.com)
paywalled
-
What are the wild ideas on how we'll maintain code? (www.reddit.com)
-
-