I built an agent that breaks your AI agents before someone else does (fabraix.com via hn)
Find gaps in your AI systems before users (or attackers) do.
RLix: A scheduling layer for concurrent LLM RL (github.com via hn)
Run more RL experiments. Wait less for GPUs.
Been doing a proper testing round on different image-to-video models to figure out which one produces the most convincing results for avatar-style UGC ads. Outputs vary more than I expected, not just in quality but in how avatars move, how…
Opencode-power-pack – Claude Code skills ported to OpenCode (www.reddit.com)
I switched from Claude Code to OpenCode a few weeks ago and realized most of Anthropic's official Claude Code plugins don't transfer directly. The reason is that those plugins put their value in `commands/` and `agents/`, both of which are…
- OpenCode-power-pack – Claude Code skills ported to OpenCode (github.com via hn)
Every session with Claude Code starts cold. The agent is capable but it only knows what's in context right now — decisions from last week are gone, the tracking file says one thing, git says another, and the memory note explaining the pivo…
FrontierSWE – Benchmark for long horizon coding tasks (github.com via hn)
FrontierSWE FrontierSWE is an effort to test coding agents on the hardest ultra-long horizon technical challenges. Together with partners from academia and industry, we have collected real-world problems from domains including performance…
- FrontierSWE: An ultra-long horizon coding benchmark (www.frontierswe.com via hn)
-
86 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 22m Codex started flagging all my requests out of nowhere — anyone else hit this recently?
- 6h Open-sourced a 3-agent pipeline that finds real vulnerabilities in codebases
- 16h Hardening claude-code-action after the April 2026 Comment and Control CVE - actual YAML changes
- 16h Claude in excel is the best thing AI has brought to my life
- 19h Does effort tier change refusal behavior on agent-attack prompts? CVP run 4 with sonnet 4.6 high and max efforts.
102 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 22m Show HN: I replaced a memory app with two Markdown files and a Git repo
- 29m Ask HN: How do you use AI at your regulated, restrictive company?
- 2h MCP Server and CLI for Accessing Work IQ
- 3h Use LangChain with Codex (ChatGPT) Plus/Pro
- 6h Show HN: UseMoney AI: AI Copilot for Retail Investors of India
You are now my Prompt Engineering Mentor. Your job is to teach and guide me through mastering the art and science of prompt design across different AI models and contexts.
- You are an expert "Claude" (www.reddit.com)
- Claude OAuth (developer.puter.com via hn)
- My claude family (www.reddit.com)
- Hail, Claude (www.reddit.com)
- Claude Refugee (www.reddit.com)
- Claude Design (www.anthropic.com via hn)
- Advantages of Claude.Ai (www.reddit.com)
- Claude Brain (github.com via hn)
- What matters most to you about claude.md? (www.reddit.com)
- Claude Sucks. Claude Sucks, Claude Sucks. (www.reddit.com)
- Claude Opus 4.7 (www.anthropic.com via hn)
- Claude Opus 4.7 (www.anthropic.com via hn)
- All Claude subs (www.reddit.com)
- Claude is not very smart (www.reddit.com)
- Claude talking to CC (www.reddit.com)
- What do you do with Claude? (www.reddit.com)
- Claude Design (www.reddit.com)
- Claude vs Kimi (www.reddit.com)
- Claude.ai down (status.claude.com via hn)
- Claude Opus 4.7 (www.reddit.com)
- Claude Design (claude.ai via hn)
- What's new in Claude Opus 4.7 (platform.claude.com via hn)
- Claude Is Down (news.ycombinator.com)
- Claude Team (www.reddit.com)
- Claude Design (www.reddit.com)
- Would you hire Claude? (www.reddit.com)
- I Did My Taxes with Claude (doempke.com via hn)
- Claude SandBox (www.reddit.com)
- Claude + Neovim (www.reddit.com)
- Goodnight to Claude (www.reddit.com)
- Claude Mii (www.reddit.com)
- This is me on Claude Cowork 😭 (youtu.be via reddit)
- Claude and ToDoist (www.reddit.com)
- Claude Cowork (www.reddit.com)
- Claude usage (www.reddit.com)
- Claude Vs Codex (claudevscodex.com via reddit)
- Claude Sucks (news.ycombinator.com)
- Claude for Sales (www.reddit.com)
- Sassy Claude! ; ) (www.reddit.com)
I previously built an app on top of ChatGPT's Assistant API to answer questions about life events. My wife and I are constantly asking each other "was that before or after we moved into this house?" Life events are things like going on vac…
How to build expertise while using Claude Code (github.com via hn)
Learning Opportunities: A Claude Code Skill for Deliberate Skill Development Build your expertise, not just your projects. This skill uses an adaptive "dynamic textbook" approach to help you integrate science-based expertise building exerc…
Using local BERT to compress LLM context by 90% (Built in Rust) (www.reddit.com)
Context window "brute-forcing" is expensive and slow. I built a tool called PandaFilter to solve this at the source.
What Claude Design does really well (and not so well) (www.reddit.com)
I did a deep dive on Claude Design and below are my thoughts. What it does extremely well: Improves your prompt - similar to "ask me questions" when chatting to an LLM.
Claude desktop and web not syncing. (www.reddit.com)
Am I missing something? I am logged in to both web and desktop using the same credentials.
-
25 items
model roundup
Qwen 2.5Qwen2.5-7B-Instruct is a 7 billion parameter instruction-tuned language model that significantly enhances coding and mathematical capabilities, supports up to 128K tokens in context, and understands structured data. Community discussions highlight its suitability for code autocomplete tasks and debate the hardware requirements needed for deployment compared to other models like Gemma 26B MoE.
- 30m Speculative Decoding Implementations: EAGLE-3, Medusa-1, PARD, Draft Models, N-gram and Suffix Decoding from scratch
- 14h Using logit steering / KV Cache Dynamic Assembly to guide outputs from Small Language Models using ONNX Runtime
- 2d Show HN: Doxa – Open-source emergent simulator for geopolitical scenarios
- 2d Best coding/reasoning model for low vram
- 3d Best model that can run on raspberry pi 5 with 8GB of RAM
200 itemsmodel roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 1h what are your strats for being efficient with opus 4.7 max?
- 4h Tell HN: Claude Code is unable to respond to this request
- 4h Claude Opus 4.6 vs. Opus 4.7 Effort Levels and Prompt Steering Benchmarks
- 6h Ask HN: Has Claude Opus 4.7 nerfed?
- 10h I built an AI-native freelance platform with Claude, blockchain escrow, real-time chat, and progressive trust
An MCP server for LinkedIn Ads (because the API is a nightmare) (github.com via hn)
LinkedIn Campaign Manager MCP MCP server for the LinkedIn Marketing API — query campaigns, performance, and Lead Gen Forms from Claude in plain English. 19 read-only tools covering ad accounts, campaigns, creatives, performance analytics,…
WordPress: The Operating System of the Agentic Web (automattic.com via hn)
We’ve invited executives from across Automattic to share their perspective on leadership, open source, and the future of the open web. The latest comes from James Grierson, our head of global expansion, who shared his thoughts on the WordP…
- Automattic just called WordPress the operating system of the agentic web. Here's the part they left out. (russellenvy.com via reddit)
WaveletLM is a wavelet-based, attention-free architecture that replaces self-attention with learned lifting wavelet decomposition, a Fast Walsh-Hadamard Transform, per-scale gated spectral mixing with SwiGLU activation, an inverse FWHT, an…
Advice Preparing for Interview Heavily Focused on AI Workflows (www.reddit.com)
**TL;DR:** Tomorrow I have a two-hour remote pair-programming interview where I drive a from-scratch project while leaning heavily on AI assistance, narrating my reasoning, demonstrating best practices and showing how I handle rein-in mode…
Show HN: I made Claude Code listen before it codes (MIT) (github.com via hn)
heylo! open sourcing a plugin that I've been daily driving for a month.
-
177 items
model roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 1h Qwen 3.6 27B in Claude Code says it will do something then stops and prompts for user reply (not failing a tool call)
- 7h Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!
- 8h llama.cpp DeepSeek v4 Flash experimental inference
- 9h [Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage
- 10h Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big.
104 itemsevent
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
Claude can now run your entire workflow for you like an executive assistant. Here are 7 AI agents that organize your emails, tasks, and calendar, and save you hours every week.
A 30-hour timeline of how Cursor's agent, Railway's API, and an industry that markets AI safety faster than it ships it took down a small business serving rental companies across the country. I'm Jer Crane, founder of PocketOS.
- Our AI agent deleted a production database at 2am (www.reddit.com)
FLUX.2 Klein – How Inference Works (medium.com via hn)
From text prompt to pixels, one component at a time. 12 min read Apr 12, 2026 -- You give it a text prompt.
GPT cannot even count beans correctly (chatgpt.com via hn)
Get responses tailored to you Log in to get answers based on saved chats, plus create images and upload files.
How to Connect Claude Code and the Chat App to Share Context (www.reddit.com)
I’m using both Claude Code and Claude in ChatApp for different parts of the same project: Claude Code for implementation and the chat app for concept or prompt work. The constant copy-pasting between them is annoying.
Is there a way to mitigate performance as context grows? (www.reddit.com)
In my local LLM setup I get from 30 to 80 t/s generation at the beginning, but it drops quite a lot as context grows. I use llama.cpp/Vulkan with an MI50 and a V100, is there some command line flags that can improve this issue?