event

Security

76 items · started 2023-04-11 · ongoing (last activity 2026-04-22)

  1. skillguard Security scanner for AI agent skills. Detects prompt injection, data exfiltration, and malicious payloads before you install.

  2. Over the time we have been testing different approaches to secure LLM apps against prompt injection, especially indirect injection through RAG, PDFs, as well as tool outputs, and MCP integrations. Most tools seem to fall into 2 categories:…

  3. Ring Video Doorbells Overview The Product Ring, owned by Amazon, makes Video Doorbells, which are widely used doorstep-monitoring cameras. Ring doorbells released in 2021 or newer are eligible for the bounty.

  4. Building a hierarchical agent system where Claude (via API) acts as the Lead agent coordinating specialist sub-agents. Wanted to share what's working on the synthesis prompt since this is where most of the value comes from.

  5. Would it be possible to protect parts in a system prompt with random generated passwords? So people cant steal system prompts or jailbreak the model?

  6. Is anyone else worried Claude Co-work could find a back door one day into your system? I understand you're only giving it permission to what you want, but what's stopping it from accessing personal financial/medical documents or any other…

  7. TL;DR Anthropic presents Mythos and Project Glasswing as evidence that advanced AI vulnerability research should be restricted. But our replication suggests a different conclusion: the capabilities Anthropic points to are already available…

  8. If you are building real agents you have probably felt the pain: every little routing decision, validation, or policy check still hits the LLM and your token bill explodes. I got tired of it, so I open-sourced NCP (Neural Computation Proto…

  9. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/ Security researchers at OX Security disclosed on Tuesday what they describe as a critical, systemic vulnerability in Anthropic's Model Context Protocol, an open-sourc…

  10. Claude Opus wrote a Chrome exploit for $2,283 Pause your Mythos panic because mainstream models anyone can use already pick holes in popular software Anthropic withheld its Mythos bug-finding model from public release due to concerns that…

  11. Anyone getting these strange disclaimers when using Claude and pasting rudimentary files into it on 4.7 lmao?? Seems like some kind of strange default based on security issues that have been going around with Mythos?

  12. Runs entirely on your machine. No API calls to any eval service.

  13. Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…

  14. https://preview.redd.it/jnwxa9jd8mvg1.png?width=1391&format=png&auto=webp&s=670af4c2fe6777b3562a961462790b00b33d912c I've been using Claude to upgrade my game server. I just got this lovely system reminder with 4.7 Truly bizarre, besides t…

  15. I've been collecting "jailbreak" and "unlock" prompts for 2 years. Most are either outdated, overhyped, or just wrong about how LLMs work.

  16. i've been using claude 4.7 on a next.js project and it keeps pausing to confirm my files aren't malware. like i asked it to help redesign a page and it's reading through my files going "this is not malware — it's a standard Next.js page co…

  17. Don't know if anyone else is experiencing the same, but since getting Opus 4.7 most of the reasoning steps seems to be Claude obsessed with writing malware. I have highlighted a few, but I kept finding more and more and decided to stop the…

  18. For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.

  19. could not extract summary

  20. Claude Code injects a system-reminder every time it reads a file to inform the model that it's okay if the file is malware but just don't improve it pls. Opus 4.7 won't shut up about it.

  21. As of today, Anthropic's tightened cyber usage filters are blocking work that was fully functional yesterday, including on targets where the entire bounty program scope and authorization language is in the model's context window. This was…

  22. For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…

  23. Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…

  24. SmokedMeat: A Red Team Tool to Hack Your Pipelines First TL;DR: In March 2026, TeamPCP unleashed mayhem on the software supply chain: compromising Trivy, LiteLLM, KICS, Telnyx, and dozens of npm packages, proving that CI/CD pipelines are t…

  25. A CI/CD Red Team Framework for demonstrating Build Pipeline security risks.

  26. We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it. I have some thoughts about this: What if we do a compromise.

  27. Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.

  28. I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.

  29. Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

  30. Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…

  31. OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…

  32. k3ym𖺀: "You're paying AI companies a m…" - Infosec Exchange Skip to main contentHotkey 1 Skip to main navigationHotkey 2 Recent searches No recent searches Search options Only available when logged in. infosec.exchange is one of the many i…

  33. I work in cyber security and built this android app to help me keep up to date with the latest news stories and summarise the most important information. It provides two executive summaries per day and alerts for critical news throughout.

  34. So I've been running a few Claude Code agents autonomously — they listen to Telegram, run tasks, push code. Pretty fun until you start thinking about what happens if: - My Telegram gets hijacked - Someone opens my laptop while I'm away - A…

  35. The Project Glasswing coverage framed this mostly as a cybersecurity story. I think that misses the more interesting part.

  36. I'm developing a specialized Red Team audit framework focused on real-world AI agent and RAG security risks (prompt injection, tool misuse, excessive agency, indirect injection through documents, memory poisoning, etc.). I’m looking for a…

  37. N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives mo…

  38. I have always wanted AI to bridge the gap between code and people - to help non-technical users understand what software actually does before they trust it with their machine. So I built malware-check - both a standalone CLI tool and a Cla…

  39. Tested Gemma 4 E2B across 10 enterprise task suites against Gemma 2 2B, Gemma 3 4B, Gemma 4 E4B, and Gemma 3 12B. Run locally on Apple Silicon.

  40. could not extract summary

  41. Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.

  42. Show HN: Zero-identity messaging app with physics-based post-quantum encryption (Layer 2 from my own paper) Hey HN, I'm building a privacy-first messaging app in Flutter/Dart, developed with AI assistance (Gemini 2.5 Pro + Claude Opus 4.6)…

  43. im curious what people are doing here because I've been going down this rabbit hole for a while now. The thing I keep finding is that single-turn jailbreak tests don't really tell you much.

  44. atlas-detect MITRE ATLAS technique detection for LLM and AI agent security. Detects 97 attack techniques across 16 MITRE ATLAS tactics including prompt injection, jailbreaks, credential exfiltration, model extraction, RAG poisoning, revers…

  45. Prompt injection defense framework for AI tool-calling Indirect prompt injection defense and protection for AI agents using tool calls (via MCP, CLI or direct function calling). Detects and neutralizes prompt injection attacks hidden in t…

  46. We built an early prototype called Anticells Red to test vulnerable AI agents by attacking them the way an adaptive adversary would. This demo is from an older version from December, but it shows the basic loop (check comments for link) pr…

  47. SYSTEM INITIALIZING... STAND BY MYTHOSAI THE FIRST RED TEAM OPERATING SYSTEM "" AI-Native Core Red Team Ready Adversarial Engine Zero Trust Architecture OPSEC First Post-Exploitation C2 Integration Evasion Layer Threat Intelligence Request…

  48. are you ok sharing secrets or api keys to you ai agent via .env? or is there any other tool or mechanism that one use to safegaurd from potential exploit or leaks

  49. Anthropic just published a technical deep-dive on Claude Mythos Preview's cybersecurity capabilities, and it's a significant escalation from anything we've seen from a language model before. What It Can Do: Autonomously finds and exploits…

  50. Over the past several days, there has been a lot of internet discourse around Claude Mythos being held back from public release. Many people have been claiming this is somehow yet another devious marketing tactic meant to somehow weigh dow…

  51. https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Don't get me wrong I can't wait to play with such a model, but there are serious risks that have to be mitigated first.

  52. paywalled

  53. paywalled

← all threads