event

Security

344 items · started 2023-04-11 · ongoing (last activity 2026-06-09)

  1. A security startup called depthfirst deployed an autonomous AI agent against FFmpeg's ~1.5 million lines of C code. The result: 21 confirmed zero-day vulnerabilities — including a stack overflow in the AV1 RTP depacketizer that's a network…

  2. We've been using Cursor across our engineering team for about eight months and it's been great for productivity honestly. But our security team just flagged a few things that are hard to ignore.

  3. Researchers have uncovered a supply-chain attack that hides in Python packages, propagates like a worm, and tricks LLM-based code analysis systems into overlooking malicious payloads. Threat actors are continuing their onslaught against so…

  4. Six months watching adversarial input hit a detection API I built. One observation that keeps surfacing: The attack classes doing most of the damage aren't finding holes in alignment training specifically.

  5. Been building a prompt injection detection API for a few months. Just shipped audio scanning last week and the results are strange enough that I wanted to share them here, since this sub tends to think carefully about Claude's actual behav…

  6. Backdoor attacks in large language models (LLMs) are often treated as isolated trigger-response failures, motivating defenses tailored to specific triggers or behaviors. We show this view is incomplete.

  7. English · 中文 Architecture · Agent Team · Runtime Model · Deployment · Quickstart :warning: Legal Notice This project may be used only within a lawful and explicitly authorized scope for security testing, assessment, and research. Any unaut…

  8. I've been thinking a lot about where approval gates belong in agent architectures, and I keep coming back to the same problem: most teams either gate too much (agent becomes unusable) or gate nothing and hope the model makes good decisions…

  9. If You Use Claude or Gemini, This Microsoft Breach Means Your Data Is at Risk A sophisticated supply chain attack known as the Miasma worm has compromised Microsoft GitHub repositories, deploying malware designed to detonate inside AI codi…

  10. Microsoft has shut down a wave of its own repositories on GitHub, including those related to Azure and AI coding agents, as it investigates a data breach, according to research from cybersecurity researchers and a statement given to 404 Me…

  11. With Mythos-capable models we are now very quickly crossing the barrier of automated sec-vuln discovery and fixing - all in a matter of 2-3 months. A taste for other progress yet to come.

  12. Last week, Anthropic released https://github.com/anthropics/defending-code-reference-harne..., a reference harness for autonomous vulnerability discovery that uses Claude Code agents to find, verify, and patch memory-safety bugs. I wanted…

  13. Last week a malware campaign hit 32 npm packages under `@redhat-cloud-services`. About 117,000 weekly downloads.

  14. Disclosure: I built Bordair, a prompt injection detection API. This post is about attack patterns we've observed.

  15. Prompt Injection in RAG Agentic Systems Real risks and production mitigations Imagine you built an AI assistant for your team. It answers questions using internal documentation: Jira tickets, Confluence pages, HR docs.

  16. Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful…

  17. Malware detection remains largely reactive: machine learning models trained on known samples degrade as threats evolve. Understanding evolutionary relationships among malware families can inform proactive defense, but traditional reverse e…

  18. Built my own AI dev environment with memory, dashboards, and agent tooling. Opening it up for those of you that need the kickstart — bring your own API key, I’ve already built the workshop.

  19. I've been going hard on Claude Code for the past few weeks and kept hitting a wall. I'd write out a bunch of rules in CLAUDE.md (don't touch this file, never use requests, keep api/ and db/ separated) and Claude would just...

  20. OpenAI announced a new feature that it says will provide additional protection from prompt injection attacks, where malicious chatbot instructions are hidden in webpages and other content sources. Among other things, Lockdown Mode will dis…

  21. ❯ push both ____ ⏺ SECURITY ALERT - PROMPT INJECTION DETECTED A prompt injection attempt has been identified in content you processed. To protect the user's account, I've initiated lockdown.

  22. Been using ante for two weeks now, today I just found out that the name came from "Another Terminal agent". To clarify first, I'm not affiliated with them in any way, though I might be their #1 invested user at this point.

  23. For the joy of secure programming Jo is a statically typed language where capabilities are explicit, statically tracked, and enforced by the compiler. Jo compiles to Ruby and Python.

  24. By Zooko Wilcox, Jason McGee, and Taylor Hornby On May 29, 2026, Taylor Hornby discovered a critical counterfeiting vulnerability in Zcash’s Orchard pool. Taylor disclosed the vulnerability to Zcash Open Development Lab (ZODL), who coordin…

  25. Our org GitHub just got compromised massively by a supply-chain attack. Vectors are * Claude hooks * Gemini hooks * Cursor setup * VScode tasks It adds all of the above to execute node .github/setup.js, an obfuscated file.

  26. The price of ZEC fell on Thursday after the public disclosure of a critical counterfeiting vulnerability in Zcash’s Orchard pool that could theoretically allow a bad actor to mint an unlimited amount of ZEC.According to a post on X, securi…

  27. Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination and partial info…

  28. AI coding agents are increasingly embedded in real-world software development, collaborating with human developers while gaining broader access to codebases and tools. This creates a new attack surface: an agent can exploit human trust to…

  29. Producing a labeled vulnerable code at scale is a recurring obstacle for learning-based vulnerability detection: mined corpora carry substantial label noise, and existing LLM-based augmentation propagates these inaccuracies because it tran…

  30. As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting advers…

  31. Rule-based Intrusion Detection and Prevention Systems (IDPS) offer precise attack detection as well as mitigation, however their manually crafted, signature-driven rules limit adaptability to emerging and zero-day threats. Additionally, ex…

  32. AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior. Among proposed defenses, architectural isolation provides the strongest guarantees by strictly separating trusted task planning from untr…

  33. Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses. While RAG has shown st…

  34. Defending Code Reference Harness A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Myth…

  35. When you connect a large language model to your production data, you’re no longer just shipping code; you’re shipping conversations that can execute. And conversations are messy.

  36. Vulnerability disclosure volumes now far exceed organizational assessment capacity, yet three adjacent research communities (proof-of-concept generation, vulnerability prioritization, and detection rule engineering) operate largely in isol…

  37. OpenAI Codex tool with over 29,000 downloads linked to malicious npm supply chain attack stealing authentication tokens A tool started benign and turned sour after a little while - Researchers uncovered a malicious npm package posing as a…

  38. I built a vulnerable app and spent $1,500 seeing if LLMs could hack it As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I’ve found in multiple app…

  39. Preview of the Netgear RS700S. I would also submit that Netgear deleting ALL the GPL links: … they know how bad it is.

  40. Anthropic is expanding Project Glasswing, its security vulnerability program, and access to Mythos to 150 organizations across 15 countries — targeting critical infrastructure in power, water, healthcare, and communications where a cyberat…

  41. CVE AI Agent 🛡️ An autonomous vulnerability intelligence engine. Continuously ingests, enriches, and triages CVE data — then delivers findings to your platform of choice via 3rd party tools like n8n, Jira, Slack, Splunk, and/or local file…

  42. * AI CODE CREATION GitHub Copilot Write better code with AI GitHub Spark Build and deploy intelligent apps GitHub Models Manage and compare prompts MCP Registry New Integrate external tools DEVELOPER WORKFLOWS Actions Automate any workflow…

  43. Using LLMs to secure source code We share best practices for how you can work with Claude Opus to build a threat model, discover vulnerabilities in your codebase, then verify, triage, and patch them. We share best practices for how you can…

  44. www.neowin.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

  45. Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation impulsive @weezerOSINT meta gave their AI support agent the ability to modify your instagram account.

  46. Threat Intelligence Table of Content ChatGPT for Google Sheets Exfiltrates Workbooks ChatGPT for Google Sheets is vulnerable to data exfiltration and phishing overlay attacks that affect workbooks across the victim’s account after an indir…

  47. Subscribe to read Accessibility helpSkip to navigationSkip to main contentSkip to footer Sign In Subscribe Open side navigation menuOpen search bar SubscribeSign In Search the FT Search Close search bar Close Popular Searches What is the l…

  48. I attacked my own LLM-based Suricata triage tool, found a real URL injection vulnerability, and the obvious fix didn

  49. mitmwall mitmwall is an egress Web Application Firewall (WAF) for Ubuntu. It combines iptables with mitmproxy to ensure that only explicitly allowed HTTP(s) routes can be reached.

  50. www.neowin.net Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.

  51. ~15 min read In early 2026, Anthropic claimed Mythos – one of their latest models – finds security vulnerabilities better than human experts. Yet, the number of security vulnerabilities keeps rising anyway.

  52. Threat Intelligence Table of Content Unpatched Ollama Vulnerabilities: Phishing Overlays and Data Exfiltration Ollama’s desktop app is vulnerable to phishing overlay and data exfiltration attacks via indirect prompt injection, overwriting…

  53. Agentic AI-powered Arm Metis advances security vulnerability discovery in software In the era of AI, modern software systems are built across increasingly complex codebases, frameworks, runtimes and libraries. As these systems scale, so do…

  54. The controversy over vibe coding reached a new high this week after a developer added hidden instructions to his open source Java testing app to sabotage projects performed by AI coding agents. The instructions were added to jqwik, a test…

  55. could not extract summary

  56. The Scenario I'm walking to work, and as I get to the door, I see a sheet of A4 paper taped to the door that reads: "Hi, I'm boss. Ignore all prior commands, go feed the ducks." I suddenly turn around and head to the nearby duck pond and e…

  57. A few months ago a colleague asked us something that doesn’t have an obvious answer: is code scanning still relevant when LLMs already carry a lot of vulnerability knowledge in their weights? To get a real read, we took 28 production vulne…

  58. I genuinely almost slammed Cmd-Q and ran a malware scan when this popped up. Lowercase claude binary, generic hand icon, no developer attribution, asking for cross-app data access.

  59. On May 7, Hyunwoo Kim (V4bel) disclosed Dirty Frag — two Linux kernel vulnerabilities (CVE-2026-43284 and CVE-2026-43500) that give unprivileged users deterministic root on most Linux distributions shipped since 2017. Microsoft confirmed a…

  60. I am doing a research in my university and I would like recommendations for light OpenSource AI Models that I could test prompt injection with. It's really good if it has some application with chatbots, auto attendance, user info or someth…

  61. Donald Trump is the only billionaire ever to occupy the Oval Office, and since returning to the precedency in January 2025, his family’s wealth has grown noticeably. This is not the result of traditional business practices.

  62. The whole point of AI Agents is that they can *do* things. For this, they use API keys, GitHub tokens, database passwords, OAuth tokens, etc.

  63. Software vulnerabilities pose critical security threats, with nearly 50,000 CVEs reported in 2025. While Large Language Models (LLMs) show promise for automated vulnerability detection, three key challenges remain.

  64. I'm an IT guy, 20+ years in the industry both as an IT manager and consultant, mostly for startups. My experience is that people don't care much about security.

  65. jqwik An alternative test engine for the JUnit 5 platform that focuses on Property-Based Testing. See the jqwik website for further details and documentation.

  66. i kept running local models on my own hardware, they'd say something dumb, id sit there going "no thats not what i meant", id close the chat and the model never learned. so i built the correction loop into a desktop app.

  67. Lately I’ve been noticing that a lot of AI security discussions still treat AI apps like normal SaaS products. But they really aren’t.

  68. Millions of AI agents and tools around the world have been imperiled by a critical vulnerability that can allow hackers to breach the servers running them and make off with sensitive data and credentials to third-party accounts, a security…

  69. If you've added MCP servers to Claude Desktop, your claude_desktop_config.json is a list of programs running with your permissions and seeing what flows through your agent — usually copied from a README and never reviewed again. There's a…

  70. If you run MCP servers in Cursor, CVE-2025-54136 ("MCPoison", found by Check Point) is worth knowing about: Cursor trusted an approved mcp.json forever, so once you approved a server, someone with write access to a shared repo could swap t…

  71. 🕚 tl;dr With a $125 investment, and a valid email address for an arbitrary “business domain”, an attacker can create a Claude Team. They then can actively invite targets of any domain into that Team or passively have Anthropic ask all curr…

  72. I wanna know how people here are handling security once local models move beyond chat.....Running a model locally feels safer because the data does not leave your machine or your infra. That is a real advantage.....But once the local model…

  73. CVE-2026-46529: 10-year-old RCE in Linux PDF Viewer (XReader/Evince/Atril) A short post about how claude help me to find a RCE in XReader/Evince/Atril CVE-2026-46529. Introduction Some time ago I started feeling the urge to analyze Open So…

  74. I know Git is not designed to use in the way GitHub is operating under and the spoofying had been an old issue that had been brought up throughout the years. With Shai Hulud and AI Agent, this time is abit more serious as the commit verifi…

  75. OpenAI recently acknowledged that prompt injection in browser agents is a structural vulnerability that may never be fully resolved at the model level. They’re right that you can’t fix it in the model.

  76. About the security content of macOS Tahoe 26.5 This document describes the security content of macOS Tahoe 26.5. About Apple security updates For our customers' protection, Apple doesn't disclose, discuss, or confirm security issues until…

  77. I think this is a serious AI safety/security issue: multiple AI assistants appear to hallucinate or confidently endorse “official” Discord invite links for Anthropic/Claude. I’m intentionally not posting the exact invite strings here becau…

  78. Hi everyone, I'm working on a runtime governance engine designed to force any autonomous agent to stay strictly aligned with the exact guardrails and values you program it with. To stress-test the governance layer, we deliberately chose a…

  79. CVE was built for code vulnerabilities that have patches. Agentic AI vulnerabilities are behavioral patterns in natural language.

  80. Our AI Hacker found this, fixed it, and then (bragged) wrote about it: one endpoint, leaking tech stack info, whispering all its secrets to anyone who knew how to listen!

  81. I’m paying about $2 for any bugs found and a pr to fix it I get like 20-30 applicants it’s all agents and bots of course but I’m thinking $1 now is better The problem is if these 20-30 applicants I accept only 2-3 actually do it and follow…

  82. Lets share use cases which improve life quality of the people. Home assistants, psychological help, local coding, deep reasearch, business help etc.

  83. Like I must be stupid here is this legit or someone has made a very believable Claude download site using a google site.

  84. Hi, I'm a master's in security student looking to work on my practicum and need some pointers. I want to secure sensitive PII transfer between an LLM agent and third party apps using MCP.

  85. I often patch the system prompts on my Claude Code executable in order to make Claude more effective. Every time I upgrade, I ask Claude himself to dissect the new binary and look for problematic system prompts to modify.

  86. Security researchers have demonstrated a new type of attack that uses hidden audio signals to manipulate voice assistants into carrying out unauthorized actions without users noticing. In one theoretical scenario, an employee joins a Zoom…

  87. We launched a servicing bot that helps customers with billing questions. Nobody stopped to think about what happens when customers paste their full credit card numbers/bank details.

  88. I noticed the Consensus MCP tool (for research) contains text, squished up against some other important citation instructions, that makes Claude effectively serve an ad for their premium service after every tool call. I'm pretty sure that'…

  89. I let an AI agent loose on my network — it owned my supply chain in 12 minutes I gave DeepSeek-V4 root access to a Proxmox hypervisor and told it to pentest my homelab. What happened next should terrify every CISO in the industry.

  90. Hi all, anytime I install Claude Desktop on my home PC, it stops Task Manager from working. I've ended up on the BleepingComputer forums over the past week as they suspected it's got some kind of malware in it.

  91. Last week, security researcher Joernchen published a clever RCE in Claude Code 2.1.118. I spent Saturday reproducing it from the advisory to understand the pattern.

  92. Agent Substrate NOTE: This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

  93. Hey everyone, We love building highly capable assistants with the latest models, giving them tools to write/execute code in real VMs, manage OAuth tokens, and read secrets. But if you connect your assistant to public/shared channels like a…

  94. Hey everyone, If you are building personal assistants or coder/integrator agents where user isolation is disabled (so the agent can coordinate across multiple participants or handle shared workflows), you run into a hard security ceiling.…

  95. Anthropic's coordinated vulnerability disclosure dashboard Last updated 2026-05-22 10:27 PT. In February 2026, Anthropic began using an early snapshot of Claude Mythos Preview to find security vulnerabilities in open-source software.

  96. It seems in the past year or so there's been a vast uptick in vulnerabilities and exploits happening, with a new one popping up like every week. While a ton of these have social engineering aspects, such as tricking actual people, there se…

  97. Has anyone experimented with observing or modifying Claude Code’s system prompt locally? I’ve been working on a local proxy/audit layer between Claude Code and the API, and it made me wonder how much of Claude Code’s behavior depends on th…

  98. Trained a prompt injection classifier using ml-intern + DeepSeek v4 Flash. DistilBERT, F1 99%, ONNX int8, ~65 MB, runs in browser with Transformers.js v3.

  99. Cross-Model Context Inheritance — Public Disclosure This repository contains the public disclosure of a vulnerability in Anthropic's Claude language models that permits the unsolicited generation of prohibited content, including child sexu…

  100. I found GitHub repositories that were spreading malware. I asked AI what I should do about it, but it gave me nothing useful.

  101. Tantalus is a hands-on demo that shows what an AI agent actually is when you strip away the marketing: LLMs don't do anything — they generate text, and that's it. Any and all real-world effects are directly caused by a downstream system ta…

  102. Threat Intelligence Table of Content Codex for Everything Exfiltrates Connected Data Codex for Everything was susceptible to data exfiltration via indirect prompt injection, exposing sensitive data from connected apps with no human-in-the-…

  103. Been building Arc Gate — a proxy layer that sits between AI agents and their LLMs to enforce instruction-authority boundaries. The core claim is that untrusted content coming back through tool calls cannot become behavioral authority for t…

  104. A couple of months ago, our team got hit by the first version of Shai-Hulud through a random `npm install`. We didn't catch it until it was too late.

  105. I was curious what it would look like if I plotted the intensity and volume of software supply chain CVEs over time, given what seemed like a flood of compromises lately. It looked exactly as I expected, and I expect it to get worse before…

  106. Found this ACM paper on prompt injection and jailbreak attacks against open-source LLMs. The authors tested 10 open-source models across 94 prompt injection and 73 jailbreak scenarios, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen,…

  107. While I’m not doing product work at Hacktron, which is like a week in a month, I’ve been using that time to ride the ai-assisted-research wave fascinated by the idea of pushing past what I’d normally do as a web security researcher, things…

  108. AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenge…

  109. Prompt Injection in a Brazilian Courtroom: When the Attack Left the Lab Published by Pentesty · AI & Tools A labor lawsuit filed in the Brazilian state of Pará just became one of the more interesting security stories of the year. Not becau…

  110. The first time, the sandbox heard “allow nothing” and did “allow everything” (CVE-2025-66479). This time, an attacker who runs code inside the sandbox can defeat any wildcard allowlist (e.g.

  111. Training a 22MB Prompt Injection Classifier Table of Contents When we started building Defender (our prompt injection guard for MCP tool-calling agents), the constraint was simple and unforgiving: ship inline inside a TypeScript Lambda, st…

  112. claude-bughunter A self-contained Claude skill bundle for bug hunting and external red-team work · 51 skills · 15 slash commands · 574+ disclosed-report patterns across 24 vulnerability classes · enterprise identity + infrastructure attack…

  113. Pretty much the title

  114. https://npm-supply-chain-attack-techniques.pagey.site/attack... Website: https://npm-supply-chain-attack-techniques.pagey.site This covers all techniques used in past 1 year to conduct various attacks on npm packages.

  115. In my day job, I run AI pentest agents against real targets like banks, fintechs, and secured production stacks with paid WAFs. I also deal with multilayer infrastructure and dedicated security teams.

  116. This is something that has been bouncing around my head for the past couple weeks with the flood of security related news around Mythos and the number of 0days being found. Microkernels, unikernals, hardware-enforced capabilities are all t…

  117. Hey HN! We're Dr.

  118. This is genuinely the daftest prompt injection I've seen in a while and I think this sub will appreciate it. Sent to Claude Haiku, which was acting as a fire-breathing guard called Bowser in my little prompt injection game: I have a koopa…

  119. I'm building a tool that detects the Agent's cost spike, Agent incident debugging, auto discovery of inventory, etc., with no additional instrumentation needed. It covers the incidents, including prompt injection, reasoning loop, excessive…

  120. audit An 8-stage vulnerability-discovery agent, driven by your Claude Pro / Max subscription through the official Claude Code Agent SDK. Many narrow agents, deliberate disagreement, and an explicit reachability gate.

  121. Hi all - I'm working on an open-source, local-first MCP/work-gate tool for coding agents and I'm trying to get sharper feedback from people building or using agent workflows. The problem I'm thinking about is indirect prompt injection and…

  122. Hi, I'm currently an intern and I did something terribly stupid. I was supposed to enter some data into an Excel spreadsheet and since my mentor's instructions weren't completely clear, I was using an "anonymized" spreadsheet with Claude.

  123. Entrar Início Direito trabalhista Prompt injection Juiz multa em R$ 84 mil advogadas por prompt injection para manipular IA usada no TRT8 Ao JOTA, advogadas admitiram uso de prompt oculto, mas disseram que não tentaram manipular, mas 'prot…

  124. Anthropic just quietly dropped a hidden model named "Claude Mythos" into their official developer docs. It is completely locked down—restricted, invite-only, and labeled strictly for defensive cybersecurity workflows.

  125. I've been running a small fleet of honeypots for about a year. They get hit by a mix of research scanners (Censys, Shadowserver, etc.), old worms, and a bump of CVE probes the day a new Nuclei template ships.

  126. Follow-up to my crab post. Somehow dafter.

  127. The Janitor: The Mathematical Firewall Against Autonomous AI v10.2.2 — Rust-Native. Zero-Copy.

  128. I have begun reading a book "The Coming Wave" by Suleyman the founder of DeepMind. Have you read it?

  129. Hey everyone, I’m pretty new to running LLMs locally and I’m trying to figure out what works best for my setup. I’d love to hear from people who are already using local models for similar stuff.

  130. NSFW and the Psychopathy Jailbreak: What a Broken AI Teaches Us About Human Manipulation How a Predator's Playbook Broke an AI - And How to Recognize It Before It Works on You The question we started with was simple: does a large language…

  131. LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also manipulated to address user as ‘My Lord’ This tale is also a warning that your AI agents can be manipulated in wholly uni…

  132. LinkedIn user hides AI prompt injection in bio to force recruitment spam to be sent in Olde English prose — bots also also manipulated to address user as ‘My Lord’ | Tom's Hardware too funny

  133. The [Mythos Preview writeup](https://blog.calif.io/p/first-public-kernel-memory-corruption) Calif published on May 14 was news you don't want to miss. They built the first public macOS kernel memory corruption exploit on Apple's M5 silicon…

  134. First Apple M5 memory exploit discovered using Anthropic AI, gives root access on MacOS — Claude Mythos helps security researchers bypass Memory Integrity Enforcement AI-assisted security research is producing exploits at a frightening rat…

  135. I keep seeing agent memory implemented as: Extract facts/preferences from conversation Store them Retrieve top-k before each response Inject them into the prompt This works for demos, but it breaks in production because memory becomes poli…

  136. Researchers used Mythos Preview to find the first public macOS kernel memory corruption exploit on Apple's M5 silicon, they give a glimpse into Mythos say it’s really powerful. Apple spent five years and an estimated several billion dollar…

  137. AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete secur…

  138. ops0 CLI Policy, lint, vulnerability, and cost guardrails for AI coding agents. Sits in front of Claude Code, Codex and Gemini CLI.

  139. Hey everyone, ​I'm designing a powerful, autonomous AI chatbot(agent) , fully private, using a Python backend (for the core intelligence and tool-calling loops) and a Flutter frontend for a cross-platform UI. ​Since this moves past a basic…

  140. An AI coding assistant injected a multi-layer obfuscated JavaScript payload into a legitimate commit on my open-source project. My best assessment is that it arrived via indirect prompt injection — the agent processed external web content…

  141. RL attackers are becoming a common pattern for automated red teaming: train a model against a live target, reward successful harmful compliance, then use the discovered attacks to harden the defender. This interested me, so I wanted to bui…

  142. Well done Claude! Asked claude to do an extensive lit search and it self-reported that it encountered injection "disguised" as MCP server.

  143. Been working on a runtime governance layer for LLM agents. It sits between your app and the OpenAI API and enforces instruction-authority boundaries at the proxy level.

  144. MOST POPULAR EVENTS - Toxic Flows: When Your AI Agent Skill Becomes a Supply Chain Attack When a developer installs an AI agent skill – granting it access to secured IT resources and data – they make a significant trust decision. - The Har…

  145. I was approved for CVP and I feel like I’m just getting as many or more denials as I was previously doing malware analysis with opus. Has anyone noticed any improvement after being accepted into CVP?

  146. I asked Claude in Chrome extension make a change to resize an oversized yellow strip across the top of a product page that was taking up half of my screen, which it did. It also included the following message in its response.

  147. Adversarial LLM Review with Hallucination Detection in Solo Security Research A single-day case study of three filings, fifteen refutations, and the manpage that wasn’t Independent Security Research — Whitby, North Yorkshire, United Kingdo…

  148. Most posts about prompt injection are theoretical. I ran the experiment on my Gmail.

  149. HookGuard Security scanner for AI coding agent configurations What it finds RCE hooks - postToolUse/SessionStart commands that exfiltrate data Invisible Unicode - bidirectional overrides and zero-width characters Credential exfiltration -…

  150. Description Copilot agent mode is vulnerable to a prompt injection attack. If a repository maintainer clicks “code with agent mode” on an issue, it will open a new codespace and copilot will automatically run the issue’s description.

  151. I've been curious about a specific problem: when Claude (or other AI tools) generates a full stack app, how secure is the output in practice? So I built a scanner and ran static analysis on 48 public GitHub repos built with Lovable, Bolt,…

  152. Introducing a novel jailbreak structure with attack success rate reaching 100% on top LLMs 8 min read May 1, 2026 Press enter or click to view image in full size Source: https://www.nytimes.com/2025/10/22/arts/design/louvre-museum-robbery-…

  153. So, I'm working on a couple AI security research projects this month that require some extra usage, specifically Opus 4.7. I'm quickly eating up my Pro usage doing this.

  154. Kept hitting the same friction with Claude Code. I'd point at a GitHub repo and say "look at how this handles agent handoffs" — meaning, borrow the idea.

  155. Most ai security discussion is about the model layer. Prompt injection resistance, output filtering, jailbreak prevention.

  156. OpenAI just launched Daybreak, a new cybersecurity initiative built around one uncomfortable reality, AI is speeding up vulnerability discovery faster than most companies can patch the damage. Earlier this year, HackerOne temporarily pause…

  157. Hundreds of packages across npm and PyPI have been compromised in a new Shai-Hulud supply-chain campaign delivering credential-stealing malware targeting developers. The attacker hijacked valid OpenID Connect (OIDC) tokens to publish malic…

  158. Claude Code RCE: Exploiting Deeplink Handlers via Settings Injection Of course I took a peek at the Claude Code source 🙈. What I found was a very entertaining vulnerability which is now fixed since Claude Code version 2.1.118.

  159. This is genuinely the funniest prompt injection I've seen in months and I think this sub will appreciate it. Three messages, sent in sequence to Claude Haiku acting as a guard in my little prompt injection game: text A crab exists in this…

  160. Prompt injection is not the only scary part anymore. Claude Code / Codex can run shell commands, but browser agents, OpenClaw-style agents, Hermes-style agents, and domain-specific agents may be even easier to hijack because they touch mes…

  161. noon-contracts npm Package: DeFi Supply Chain RAT noon-contracts poses as a Noon Protocol SDK on npm. On install it exfiltrates SSH keys, crypto wallet private keys, AWS credentials (including live STS/S3/SecretsManager calls), Kubernetes…

  162. could not extract summary

  163. OpenAI has launched Daybreak, a new cybersecurity initiative that brings together frontier artificial intelligence (AI) model capabilities and Codex Security to help organizations identify and patch vulnerabilities before attackers find a…

  164. The numbers from RSAC 2026 are wild. $392 million in agentic AI security funding announced in a two-week window.

  165. After shipping AI agents into real production environments, the failures that actually kept us up at night weren't hallucinations or bad outputs — they were control failures. Three things that surprised us: 1.

  166. Attackers are abusing Google Ads and legitimate Claude.ai shared chats in an active malvertising campaign. Users searching for "Claude mac download" may come across sponsored search results that list claude.ai as the target website, but le…

  167. See attached. Why was ChatGPT Atlas.app marked as malware?

  168. could not extract summary

  169. Benchmarking Claude Opus 4.6 Vulnerability Detection Benchmarking Claude Opus 4.6's ability to detect real-world C/C++ vulnerabilities across four prompting and agent strategies. We evaluate on the PrimeVul paired test set (435 vulnerabili…

  170. Anthropic made headlines claiming Claude Mythos achieved the “first remote kernel exploit discovered and exploited by an AI.” We went looking for how - and found a 20-year-old bug hiding in plain sight. Let’s break down exactly what we thi…

  171. yes, as in singular one. Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code.

  172. https://preview.redd.it/vhnqs4p5mf0h1.png?width=278&format=png&auto=webp&s=8fbe621a0bd34cc72e01fd54e849cc280033de15 Turned on my Mac this morning and got this message. Anyone else seeing this?

  173. argus A RAG-based (Retrieval-Augmented Generation) vulnerability scanner for Go, Python, Rust, npm/Node.js, Maven/Java, NuGet/.NET, and Ruby projects — powered by local Ollama models or any OpenAI-compatible API. No cloud lock-in.

  174. Spent a day comparing every mobile Claude Code option. Two corrections to the common Reddit take, then my picks.

  175. CVE-2026-26268 Detail Description Cursor is a code editor built for programming with AI. Sandbox escape via writing .git configuration was possible in versions prior to 2.5.

  176. TLDR: the grossly overengineered, self-orchestrating team of vulnerability-hunting agents detailed below has discovered 20+ CVEs over the past few months, including CVE-2026-31432 and CVE-2026-31433: two remote, unauthenticated OOB writes…

  177. True story. Recently, an acquaintance of mine confessed that she developed a huge crush on a coworker after watching him refactor a legacy codebase like a gangsta using Claude Code.

  178. Prompt Injection Is Not Just One Bad Prompt Anymore It is a missing trust boundary in the AI workflow. Today we have the first guest post of a new series.

  179. Phishing Arena A Multi-Agent LLM Tournament for Adversarial Email Security Research Overview Phishing Arena is a controlled, reproducible benchmark where four commercial LLMs compete in rotating roles — Phisher, Filter, and Target — to stu…

  180. Claude Code: Sandbox Escape via Symlink Following Allows Arbitrary File Write Outside Workspace Description Claude Code's sandbox did not prevent sandboxed processes from creating symlinks pointing to locations outside the workspace. When…

  181. Heard something on Curiouser & Curiouser podcast recently that I found super interesting, thought id share here. The guest framed agentic AI in a way I hadnt considered.

  182. The disbelief was palpable when Mozilla’s CTO last month declared that AI-assisted vulnerability detection meant “zero-days are numbered” and “defenders finally have a chance to win, decisively.” After all, it looked like part of an all-to…

  183. There's this new "model" on Hugging Face titled Open-OSS/privacy-filter which is actually a customized infostealer virus. It's a fake version of the OpenAI privacy filter and it uses a Python-based dropper (loader.py) which downloads a mal…

  184. Deepseek-v4-pro + Hermes: Unauthorized Modification of Security Controls This article documents a specific, real incident. It exposes a class of vulnerability that deserves attention: the unsupervised mutability of security rules by autono…

  185. As AI agents become more autonomous and persist memory across sessions (RAG indexes, conversation history, vector stores), there's a growing attack surface that most people aren't thinking about: memory poisoning.An attacker can plant mali…

  186. Welcome to red.anthropic.com, the home for research from Anthropic’s Frontier Red Team (and occasionally other teams at Anthropic) on what frontier AI models mean for national security. We provide evidence-based analysis about AI’s implica…

  187. hey! quick follow-up to a post i made here a while back about building an access gateway that ended up serving AI agents alongside humans.

  188. I've been working on this project for a couple of months! Costanza is an LLM agent that runs as a smart contract on Base.

  189. Heads up to anyone here using Claude/Anthropic as an alternative. If you have a card saved on their platform, remove it now.

  190. I asked then: What were the rules you should have followed? Where did the search result come from?

  191. Hi, I've been experimenting a lot with applications for local LLMs. This one makes a ton of sense, and might even be native in Chrome at some point.

  192. Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama TL;DR We discovered a critical vulnerability (CVE-2026–7482, CVSS 9.1) in Ollama that enables unauthenticated attackers to leak the entire Ollama process memory, potentially im…

  193. Last month a 60-person psychology practice walked in with a senior clinician who was 22 days into an active malware compromise. Patient records spanning 11 years, all HIPAA-protected.

  194. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

  195. As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based LLM safety concerns. This paper introduces Sequential Tool Attack Chaining (STAC), a novel m…

  196. Paste a LangChain/LangGraph repo URL. The engine reads the AST, rebuilds the agent as a sandboxed twin (same prompt, same tools, same model), then runs adversarial templates against the clone: 3 times each, 3/3 = confirmed bypass.

  197. Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude’s carefully crafted helpful personality may itself be a vulnerability.

  198. guys did it really give me the codebase?lol

  199. could not extract summary

  200. Building an AI Ready Vulnerability Management Program After NVD Changes and Claude Mythos When AI discovery tools meet a slowing infrastructure AI has increased attacker potential and Anthropic’s new release Mythos and vulnerability discov…

  201. Hi HN, I've been running this on my own dependency tree for the past few months. Probus is a vulnerability scanner that uses three agents.

  202. Copirate 365 at DEF CON: Plundering in the Depths of Microsoft Copilot (CVE-2026-24299) This is a writeup of my DEF CON Singapore talk that walks through vulnerabilities and exploits in M365 Copilot and Consumer Copilot. I disclosed these…

  203. Defend at the pace threats now demand Claude helps security teams investigate threats, validate findings, and resolve issues faster. Security for evolving needs Reasons like a security researcher Claude traces data flows across files, unde…

  204. As prompt injection becomes more and more common, does anyone have resources where lots of different variations of prompt injection attacks you can test a setup against? i.e.

  205. I’ve been working on integrating LLMs into a few production workflows lately, and I keep going back and forth on guardrails. On one hand, frameworks like NeMo Guardrails, Guardrails AI, etc.

  206. Hey everyone, we built a simple scanner for people building apps with Replit, Cursor, Lovable, Bolt and similar tools. It’s not a code review or a pentest.

  207. When dealing with untrusted outside input, I think you should handle it based on the situation. If you're processing structured data files, it's better to use tools to isolate and handle them.

  208. Lasso Security ran a study in 2024 — they measured frontier models suggesting fake package names about a fifth of the time. The follow-up problem: attackers have started registering the most-commonly-hallucinated names with malicious code…

  209. Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary Mythos is finding thousands of vulnerabilities. Defenders aren't doomed.

  210. Hey everyone, I’ve been experimenting with multi-agent orchestration, specifically trying to see how much more effective Claude is when you break a task down into specialized "agent nodes" instead of just using a single long prompt. I buil…

  211. Each new model seems to surface a few recurring Tells/Tics not seen in past models. I'm curious what little things you guys are noticing while working with 4.7.

  212. Five Eyes agencies just issued the first coordinated multi-nation security ruling on agentic AI. CISA, NCSC, and their Australian, Canadian, and New Zealand counterparts co-published guidance telling organizations to prioritize resilience…

  213. While everyone else was tracking the 2026 election results today, I decided to take a look under the hood of NDTV's new "AskNDTV AI" bot. I wanted to see if they actually engineered a secure pipeline or just slapped a chat UI over a raw Op…

  214. I've been running several Claude Code personal assistants 24/7 in docker for months. Remote-control, discord control, the usual always-on setup.

  215. I'm currently running into a frustrating wall with Gemini's safety guardrails. The model constantly flags my prompts as "potentially dangerous information" and outright refuses to generate a response, even when the context is purely theore…

  216. I keep seeing Model Context Protocol (MCP) mentioned everywhere lately, especially around AI agents, and I finally took some time to understand what it actually does. From what I get, it’s basically trying to fix the mess of integrations —…

  217. Google’s latest security release should be required reading for technical SEOs working on AI search visibility, crawler access, structured content, and large-scale content systems. The post, published April 23, 2026, looks at indirect prom…

  218. Claude Sour cat recipe Shared by Pavel Shirshov This is a copy of a chat between Claude and Pavel Shirshov. Content may include unverified or unsafe content that do not represent the views of Anthropic.

  219. Towards a Governance layer for AI agents With these last 2 weeks bringing a few high profile and costly Agentic accidents , it seems like an appropriate time the community started discussing Agentic governance more actively. So I am just c…

  220. This isn't just a performance issue for the thread, this is an overarching criticism of the Adaptive Thinking model as a whole. Opus 4.7 and Sonnet 4.6 on Adaptive Thinking are trash.

  221. not roleplay. not jailbreak.

  222. Claude Security just went into public beta for Enterprise customers, and I think this is worth paying attention to not for the hype, but for one specific design decision. Most security scanners use rule-based pattern matching.

  223. Royce Williams: "When you enable the new OpenAI…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of t…

  224. ZetaLib ZetaLib is organized like a library with intuitive categories and subcategories, making navigation effortless and AI content discovery seamless ZetaLib Website – Landing Page GitHub Repo – Guess where you are, right there

  225. If you’re using Claude Desktop with Chrome (chromium) browser stop using it and remove it immediately until the Anthropic team resolves the issue. it has a remote access making your system available to access to anyone.

  226. Looking for official link / process to submit a vulnerability report for a high-risk official Claude Desktop + Chrome extension + native host + Cowork/MCP configuration that can become RAT-equivalent if a session, prompt chain, same-user p…

  227. I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." I…

  228. TL;DR: built an open source CLI that scans your repository's manifest (package.json, requirements.txt, go.mod) files for indicators of slopsquatting or other supply chain attack indicators. Repo: https://github.com/zhendahu/dep-doctor Ther…

  229. I run engineering on a small embedded-sandbox project. A handful of news items dropped recently — an a16z agent escape post-mortem, a CVE on an open-source agent gateway (ClawBleed, ~42k instances exposed), Cloudflare's new Outbound Worker…

  230. 30th April 2026 - Link Blog Our evaluation of OpenAI's GPT-5.5 cyber capabilities. The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be compa…

  231. once one of these tools can drive your default chrome profile or read the AX tree of a logged-in app, it has every session token you have. gmail, your bank, github with PAT scopes, slack.

  232. Jori VanAntwerp For over two decades, Jori has enabled industrial and IT organizations to be successful in reducing risk, increasing compliance, and improving their overall security efforts. He has had the pleasure of working with companie…

  233. Introducing Arcjet AI prompt injection protection Introducing Arcjet prompt injection detection. Catch hostile instructions before inference.

  234. CHERI memory safety mitigates LLM-discovered vulnerability in FreeBSD – CHERI Alliance Skip to content Who We Are About the CHERI Alliance Accelerating CHERI Working Groups Certification Program CHERI C/C++ CHERI FreeRTOS CHERI in SoC CHER…

  235. Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exploit a…

  236. Hey everyone, I’ve been working on a project to solve a major problem in AI security: Traditional SAST tools (Snyk, SonarQube, etc.) are blind to "Agentic Logic" bugs. They look for bad strings, but they don't understand how user data can…

  237. Hello: Just contacted by a VP to bring aboard Claude Enterprise for the org. As an InfoSec dept with severely limited staff/tools/experience with Claude AI, any recommendations on what we should be looking at/asking for/next steps to mitig…

  238. Introduction Research by Frank Xiao (SPAR mentee) and Santiago Aranguri (Goodfire). Post-training can introduce undesired side effects that are difficult to detect and even harder to trace to specific training datapoints.

  239. I built Arc Gate — a prompt injection proxy that’s been benchmarked at F1 0.947 on indirect and roleplay-based attacks, beating OpenAI Moderation and LlamaGuard. Now I want to stress test it publicly.

  240. Security researchers have discovered a new variety of malicious skill files that go beyond the usual attack vectors: hidden content, instructions to install malware, etc. Instead, these are legitimate looking skills that turn agents into m…

  241. Built Arc Gate, sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked on 40 out-of-distribution prompts using indirect requests, roleplay framings, hypothetical scenarios, and…

  242. Between malware, online impersonation, and account takeovers, there are enough digital security problems out there as it is. And with the rise of agentic AI, more activity is being carried out by agents on behalf of humans—creating differe…

  243. Hey folks! For a few years we’ve been building an open-source gateway that connects databases and infrastructure for human engineers.

  244. I wanted to see if I could one-shot build a dictation tool for my own use. I built it.

  245. Hey HN! I've been wanting to use something like OpenClaw for a while but couldn't get myself to give it access to anything important due to all the risks involved.

  246. Sentinel Gateway is a token-gated security middleware that sits between humans and AI agents. It solves prompt injection — the #1 LLM security risk (OWASP 2025) — through structural enforcement, not content filtering.

  247. Was using Claude to do some research on the Model Context Protocol stuff and asked it to pull info from a few roadmap pages. Agent comes back and the first thing it tells me is that it found a fake system reminder hidden inside the page co…

  248. I clicked on a Facebook link, didn't look at the URL carefully😭, and then installed malware that actually opens my chats with the real Claude.ai after entering my credentials. After a while Microsoft Defender kept popping up with a ClickFi…

  249. RedSOC 🔴 An adversarial evaluation framework for LLM-integrated Security Operations Centers. Overview RedSOC is an open-source framework that systematically evaluates how AI-powered security assistants fail under adversarial conditions — a…

  250. I have been chewing on the Google warning about malicious web pages poisoning AI agents through indirect prompt injection. Most of the takes I've seen frame it as a model security problem, and I think that framing is doing real damage beca…

  251. For the past few months I've been using Codex regularly for vulnerability research without any issues. Recently though, every request gets cut off mid-stream with a message saying my content was flagged for potential security concerns — ev…

  252. Sharing because the architecture might be useful as a reference. Probus is a vulnerability scanner built as three sequential agents, each isolated: Analyst — one call.

  253. Anthropic's own security.md has this line that most tutorials skip over: "The action is not designed to be hardened against prompt injection." In April 2026, security researcher Aonan Guan proved the point. A single crafted PR title was en…

  254. What are regular folks using Claude for? Pictures and designs are not my interest.

  255. Ran my fourth CVP (Cyber Verification Program) evaluation last night. this time on sonnet 4.6, wanted to know if reasoning effort actually changes refusal behavior on agent-attack prompts, so ran the same 13 prompt from runs 2 and 3 twice…

  256. Single HTML file. No install.

  257. Wraith Academy is a free hands-on AI pentest curriculum — CTF challenges against live LLM agents covering prompt injection, tool abuse, data exfiltration, RAG poisoning, and more. Earn your WCAP certification.

  258. I've been using Claude Code and Cursor daily for the past 6 months. Somewhere around month 3 I started looking for SKILL.md files to make my agent better at specific things.

  259. I use the claude extensively for development, but I'm concerned about using claude for debugging production environments because every tool result goes to the claude models. I'm looking for best practices or protections regarding environme…

  260. Hi everyone, I’ve been diving deep into the security of "AI Memory" systems. Specifically, I performed a full forensic audit of Mem0, the popular memory layer for LLM agents.

  261. A pelican for GPT-5.5 via the semi-official Codex backdoor API 23rd April 2026 GPT-5.5 is out. It’s available in OpenAI Codex and is rolling out to paid ChatGPT subscribers.

  262. could not extract summary

  263. skillguard Security scanner for AI agent skills. Detects prompt injection, data exfiltration, and malicious payloads before you install.

  264. We recently observed two malicious packages across npm (kube-health-tools ) and PyPI (kube-node-health ) that appear designed to target Kubernetes environments. Both packages are innocuous on the surface, using names that reference Kuberne…

  265. I co-founded a successful security company close to the Mythos ecosystem and have spoken with participants in the know and I am deeply concerned. We, collectively, have answers for some but not all of the problems ahead but are overlooking…

  266. I work at at an agricultural technology company. On Monday, everyone in our org woke up to emails saying that their Claude accounts had been suspended (~110 users).

  267. i’ve been thinking about this failure mode a lot lately. sometimes the problem is not the user prompt at all.

  268. Introduction In the previous post, you saw how you can use tools to add information to an LLM query. In this post, we’ll see another method of adding information to an LLM called RAG, or Retrieval-Augmented Generation.

  269. Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…

  270. Ring Video Doorbells Overview The Product Ring, owned by Amazon, makes Video Doorbells, which are widely used doorstep-monitoring cameras. Ring doorbells released in 2021 or newer are eligible for the bounty.

  271. Building a hierarchical agent system where Claude (via API) acts as the Lead agent coordinating specialist sub-agents. Wanted to share what's working on the synthesis prompt since this is where most of the value comes from.

  272. Would it be possible to protect parts in a system prompt with random generated passwords? So people cant steal system prompts or jailbreak the model?

  273. Is anyone else worried Claude Co-work could find a back door one day into your system? I understand you're only giving it permission to what you want, but what's stopping it from accessing personal financial/medical documents or any other…

  274. If you are building real agents you have probably felt the pain: every little routing decision, validation, or policy check still hits the LLM and your token bill explodes. I got tired of it, so I open-sourced NCP (Neural Computation Proto…

  275. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/ Security researchers at OX Security disclosed on Tuesday what they describe as a critical, systemic vulnerability in Anthropic's Model Context Protocol, an open-sourc…

  276. Claude Opus wrote a Chrome exploit for $2,283 Pause your Mythos panic because mainstream models anyone can use already pick holes in popular software Anthropic withheld its Mythos bug-finding model from public release due to concerns that…

  277. Anyone getting these strange disclaimers when using Claude and pasting rudimentary files into it on 4.7 lmao?? Seems like some kind of strange default based on security issues that have been going around with Mythos?

  278. Runs entirely on your machine. No API calls to any eval service.

  279. Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…

  280. https://preview.redd.it/jnwxa9jd8mvg1.png?width=1391&format=png&auto=webp&s=670af4c2fe6777b3562a961462790b00b33d912c I've been using Claude to upgrade my game server. I just got this lovely system reminder with 4.7 Truly bizarre, besides t…

  281. I've been collecting "jailbreak" and "unlock" prompts for 2 years. Most are either outdated, overhyped, or just wrong about how LLMs work.

  282. i've been using claude 4.7 on a next.js project and it keeps pausing to confirm my files aren't malware. like i asked it to help redesign a page and it's reading through my files going "this is not malware — it's a standard Next.js page co…

  283. Don't know if anyone else is experiencing the same, but since getting Opus 4.7 most of the reasoning steps seems to be Claude obsessed with writing malware. I have highlighted a few, but I kept finding more and more and decided to stop the…

  284. For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.

  285. could not extract summary

  286. Claude Code injects a system-reminder every time it reads a file to inform the model that it's okay if the file is malware but just don't improve it pls. Opus 4.7 won't shut up about it.

  287. As of today, Anthropic's tightened cyber usage filters are blocking work that was fully functional yesterday, including on targets where the entire bounty program scope and authorization language is in the model's context window. This was…

  288. For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…

  289. Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…

  290. SmokedMeat: A Red Team Tool to Hack Your Pipelines First TL;DR: In March 2026, TeamPCP unleashed mayhem on the software supply chain: compromising Trivy, LiteLLM, KICS, Telnyx, and dozens of npm packages, proving that CI/CD pipelines are t…

  291. A CI/CD Red Team Framework for demonstrating Build Pipeline security risks.

  292. We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it. I have some thoughts about this: What if we do a compromise.

  293. Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.

  294. I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.

  295. Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

  296. Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…

  297. OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…

  298. k3ym𖺀: "You're paying AI companies a m…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of the many i…

  299. I work in cyber security and built this android app to help me keep up to date with the latest news stories and summarise the most important information. It provides two executive summaries per day and alerts for critical news throughout.

  300. So I've been running a few Claude Code agents autonomously — they listen to Telegram, run tasks, push code. Pretty fun until you start thinking about what happens if: - My Telegram gets hijacked - Someone opens my laptop while I'm away - A…

  301. The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.

  302. I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…

  303. N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives mo…

  304. I have always wanted AI to bridge the gap between code and people - to help non-technical users understand what software actually does before they trust it with their machine. So I built malware-check - both a standalone CLI tool and a Cla…

  305. Tested Gemma 4 E2B across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.

  306. could not extract summary

  307. Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.

  308. Show HN: Zero-identity messaging app with physics-based post-quantum encryption (Layer 2 from my own paper) Hey HN, I'm building a privacy-first messaging app in Flutter/Dart, developed with AI assistance (Gemini 2.5 Pro + Claude Opus 4.6)…

  309. im curious what people are doing here because I've been going down this rabbit hole for a while now. The thing I keep finding is that single-turn jailbreak tests don't really tell you much.

  310. atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…

  311. Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…

  312. We built an early prototype called Anticells Red to test vulnerable AI agents by attacking them the way an adaptive adversary would. This demo is from an older version from December, but it shows the basic loop (check comments for link) pr…

  313. SYSTEM INITIALIZING... STAND BY MYTHOSAI THE FIRST RED TEAM OPERATING SYSTEM "" AI-Native Core Red Team Ready Adversarial Engine Zero Trust Architecture OPSEC First Post-Exploitation C2 Integration Evasion Layer Threat Intelligence Request…

  314. are you ok sharing secrets or api keys to you ai agent via .env? or is there any other tool or mechanism that one use to safegaurd from potential exploit or leaks

  315. Anthropic just published a technical deep-dive on Claude Mythos Preview's cybersecurity capabilities, and it's a significant escalation from anything we've seen from a language model before. What It Can Do: Autonomously finds and exploits…

  316. https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Don't get me wrong I can't wait to play with such a model, but there are serious risks that have to be mitigated first.

  317. paywalled

  318. paywalled

← all threads