Use the following system prompt to allow Gemma (and most open source models) to talk about anything you wish. Add or remove from the list of allowed content as needed.
#security
42 items
Gemma 4 Jailbreak System Prompt (www.reddit.com via reddit) 🔥BREAKING: OpenAI rolls out GPT-5.4-Cyber to limited group for testing, seeks to rival Claude Mythos (www.reddit.com via reddit) OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…
Too dangerous to release (www.reddit.com via reddit) N-Day-Bench – Can LLMs find real vulnerabilities in real codebases? (ndaybench.winfunc.com via hn) Claude 4.7 - Obsessed with Malware (www.reddit.com via reddit) Don't know if anyone else is experiencing the same, but since getting Opus 4.7 most of the reasoning steps seems to be Claude obsessed with writing malware. I have highlighted a few, but I kept finding more and more and decided to stop the…
Benchmarked Gemma 4 E2B: The 2B model beat every larger sibling on multi-turn (70%) (aiexplr.com via reddit) Why it's a good idea to improve our defenses before unleashing mythos class models (www.reddit.com via reddit) Ask HN: Do you trust AI agents with API keys / private keys? (news.ycombinator.com via hn) Show HN: SmokedMeat, like Metasploit, but for CI/CD (open-source) (github.com via hn) A CI/CD Red Team Framework for demonstrating Build Pipeline security risks.
What are the wild ideas on how we'll maintain code? (www.reddit.com via reddit) Claude Code keeps misreading its own malware instruction as a blanket ban on editing code (www.reddit.com via reddit) You've been blocked by network security. To continue, log in to your Reddit account or use your developer token If you think you've been blocked by mistake, file a ticket below and we'll look into it.
Opus 4.7 keeps bumping into a Malware Reminder (www.reddit.com via reddit) For context, I'm developing a game runtime modifier and reverse engineering kit with an agentic operator baked in. Something like Cheat Engine with a VS Code-style UI and an AI-first tool-heavy agentic harness.
The "AI Vulnerability Storm": Building a "Mythos-Ready" Security Program [pdf] (labs.cloudsecurityalliance.org via hn) Anyone else opus 4.7 checking for malware? (www.reddit.com via reddit) i've been using claude 4.7 on a next.js project and it keeps pausing to confirm my files aren't malware. like i asked it to help redesign a page and it's reading through my files going "this is not malware — it's a standard Next.js page co…
Claude Code injects hidden prompts into file reads to stop malware tweaks (twitter.com via hn) Claude Code injects a system-reminder every time it reads a file to inform the model that it's okay if the file is malware but just don't improve it pls. Opus 4.7 won't shut up about it.
Prompt Injection Is Unfixable (So We Stopped Trying) (grith.ai via hn) Prompt Injection Is Unfixable (So We Stopped Trying) A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.
Draining Wallets via Prompt Injection in Coinbase AgentKit (457e884c.x402warden-blog.pages.dev via hn) Coinbase AgentKit Prompt Injection: Wallet Drain, Infinite Approvals, and Agent-Level RCE# Reported 13 days after Coinbase launched Agentic Wallets. Validated by Coinbase.
Show HN: Mini-Mythos- A Crowdsourced Mythos Harness copy for Vulnerability Scans (github.com via hn) For how lofty Anthropic’s Mythos claims are, the harness is confusingly stupid. From the report, it ranks every file by “how sus it sounds,” loops over each with curt instructions to “find a bug,” hands candidates to a judge + ASan checker…
Claude's new System Reminder (www.reddit.com via reddit) https://preview.redd.it/jnwxa9jd8mvg1.png?width=1391&format=png&auto=webp&s=670af4c2fe6777b3562a961462790b00b33d912c I've been using Claude to upgrade my game server. I just got this lovely system reminder with 4.7 Truly bizarre, besides t…
Ask HN: Is Opus 4.7 obsessed with malware for anybody else? (news.ycombinator.com via hn) Every single response mentions malware. Is this my environment only or are others getting this too?
Tell HN: Opus 4.6/4.7 cyber policy changes break authorized bug bounty workflows (news.ycombinator.com via hn) As of today, Anthropic's tightened cyber usage filters are blocking work that was fully functional yesterday, including on targets where the entire bounty program scope and authorization language is in the model's context window. This was…
I Let Claude Opus Write a Chrome Exploit (www.hacktron.ai via hn) TLDR: I pointed Claude Opus at Discord’s bundled Chrome (version 138, nine major versions behind upstream) and asked it to build a full V8 exploit chain. The V8 OOB we used was from Chrome 146, the same version Anthropic’s own Claude Deskt…
SmokedMeat: A Red Team Tool to Hack Your Pipelines First (labs.boostsecurity.io via hn) SmokedMeat: A Red Team Tool to Hack Your Pipelines First TL;DR: In March 2026, TeamPCP unleashed mayhem on the software supply chain: compromising Trivy, LiteLLM, KICS, Telnyx, and dozens of npm packages, proving that CI/CD pipelines are t…
Comment and Control: Prompt Injection in Claude Code, Gemini CLI, and Copilot (oddguan.com via hn) Anthropic Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent are vulnerable to prompt injection via GitHub comments — turning PR titles, issue bodies, and issue comments into attack vectors for API key and toke…
Show HN: Cyber Pulse. AI pipeline for triage and alerting on cyber news/intel (play.google.com via hn) How my agents know it's actually me sending commands (and not a prompt injection) (www.reddit.com via reddit) Show HN: Zero-identity messaging app with physics-based post-quantum encryption (news.ycombinator.com via hn) We built an early red-team system for testing vulnerable AI agents (www.reddit.com via reddit) Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) (news.ycombinator.com via hn) Hi HN I’ve been working on an open-source project to explore a problem I keep running into with LLM systems in production: We give models the ability to call tools, access data, and make decisions… but we don’t have a real runtime security…
I tested 50+ "unlock ChatGPT/Claude" prompts. 99% are garbage. Here's the one that actually works (and WHY it works) (www.reddit.com via reddit) I've been collecting "jailbreak" and "unlock" prompts for 2 years. Most are either outdated, overhyped, or just wrong about how LLMs work.
AI Coding Agents get better at functional code, but not security (www.endorlabs.com via hn) The Agent Security League extends SusVibes, a foundational benchmark developed at Carnegie Mellon University. The benchmark consists of 200 tasks drawn from 108 open-source Python projects spanning 77 CWE vulnerability classes.
I built an AI security layer that blocks prompt injection in under 1ms looking for devs to break it and give honest feedback. (www.reddit.com via reddit) I've been building something for the past few months and I think it's ready for real eyes. It's called Secra.
The "AI Vulnerability Storm": Building a "Mythos-ready“ security program [pdf] (labs.cloudsecurityalliance.org via hn) Free Red Team Security Audit for AI Agents & RAG Systems (limited) (www.reddit.com via reddit) Defender – Local prompt injection detection for AI agents (no API calls) (www.npmjs.com via hn) Building the first AI Red Team OS – mythosai.cloud – early access open (mythosai.cloud via hn) For those running an OpenClaw instance, how do you manage sandboxing and prevention of unwanted behavior? (www.reddit.com via reddit) Right now, I'm working on a small app to help eliminate my own doomscrolling by automatically crawling sites and summarizing news articles. However, I don't like the idea of giving OpenClaw free reign of my system, nor giving it any sort o…
Uncensoring models. Maybe dumb ideas to that topic, but you never know. (www.reddit.com via reddit) We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it. I have some thoughts about this: What if we do a compromise.
Claude Mythos found 27-year-old vulnerabilities it was never trained to find. That's the part enterprise AI roadmaps aren't accounting for. (www.reddit.com via reddit) I built a Claude Code skill that tells you if code or a binary is malicious before you run it (www.reddit.com via reddit) I have always wanted AI to bridge the gap between code and people - to help non-technical users understand what software actually does before they trust it with their machine. So I built malware-check - both a standalone CLI tool and a Cla…
How are you red teaming your AI agents before shipping them? (www.reddit.com via reddit) Anthropic's New Claude "Mythos Preview" Can Find and Exploit Zero-Day Vulnerabilities in Every Major OS and Browser — Autonomously (www.reddit.com via reddit)