Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab (www.agentmcp.studio via hn)
I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML fi…
Just tried something interesting — automated the process of filing multiple RTI applications using Claude Code + Playwright CLI. What normally takes a lot of repetitive manual effort (filling forms, payments, confirmations, etc.) was handl…
How to set up personal agents? (www.reddit.com)
Hello everyone, I'm a business owner (2 physical shops) and I'd like to create different "agents" that will help me with different parts of my life For example : "Financial Advisor" who will get feed of all my accounting documents, bank ex…
-
32 items
model roundup
GPT 5.4OpenAI has released GPT-5.4-Cyber for testing and claims it will compete with Claude Mythos. Meanwhile, GPT-5.4 Pro has solved the Erdős Problem #1196, showcasing its advanced capabilities in mathematics.
- 7m Testing GPT-5.5 in early access: what we are seeing so far
- 19h Top open weight models like ds v4 pro max are still like 6-7 months if not more behind closed lab models
- 22h How can GPT 5.5 Pro be lower than GPT 5.4 Pro on the benchmark of HLE (w/ tools)?
- 1d GPT-5.5 rollout — anyone actually seeing it yet?
- 1d Is the AI subscription bubble starting to crack? GPT-5.5 just dropped, prices keep rising, and the “all-you-can-eat” era looks more fake by the month
49 itemsmodel roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
Do you ever ask "Please Claude I need this my account is kinda tokenless " (engram-three.vercel.app via hn)
The superintelligence context layer between your agents and your codebase. Engram's live code knowledge graph delivers full project context in 167 ms, cuts token usage by ~60%, and replaces 20+ file-read round-trips with a single structure…
Unlike LoRA and its variants, which inject trainable parameters directly into the weights of the Transformer, requiring tight coupling with the backbone. ShadowPEFT instead enhances the frozen large base model by adding a lightweight, cent…
The Agent Didn't Fail. It Was Just Told Too Much, Too Soon. (www.reddit.com)
Most agent failures in production aren't actually model failures. The model didn't hallucinate randomly or ignore instructions for no reason.
-
80 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 17m LLM CTF challenges. Can you crack all 13?
- 13h Most AI agent "skills" on GitHub are unvetted garbage. I built a marketplace to fix that.
- 13h env variables and claude best practices
- 18h Security Audit of Mem0 (AI Memory Layer): 23 High-Severity Vulnerabilities found (SQLi, Prompt Injection, and more)
- 2d SkillGuard – scan agent skills for prompt injection payloads
192 itemsmodel roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 20m How Anthropic can save Opus 4.7 with one change.
- 32m Claude Opus 4.7 didn't believe me that the model UV was damaged until I came up with a delta filmstrip idea for it to screenshot
- 5h Claude Status Update : Elevated error rates on Claude Opus 4.7 on 2026-04-25T01:35:55.000Z
- 14h Anyone noticed Anthropic didn't added the model Opus 4.7 and Mythos Preview to there Transparency Hub?
- 14h Claude is surprisingly good at critiquing photographs
How to Install Haiku on a UEFI-Only Modern System (hackaday.com via hn)
Recently Haiku has become a bit of a popular subject of articles and videos, owing perhaps to how close it currently is to be a daily-driver OS and fulfilling the dream that BeOS set out with. That said, there are still quite a few hurdles…
How do you stop Claude from defaulting back to its patterns (www.reddit.com)
I use Claude for writing and I've set up skills and system prompts to get consistent results. It works for a while but then it defaults back to its own habits.
Adding the phone number normally without the trailing 0 https://preview.redd.it/b80azi3us9xg1.png?width=431&format=png&auto=webp&s=45a1cca00b928fb82270c341fb315117740219f8 Here I am adding the phone number properly as the country code alre…
-
34 items
model roundup
DeepSeek 4- 24m Show HN: A CLI to use any model in your coding agent
- 43m Is Deepseek V4 really out?
- 6h Deepseek V4 flash (high) rivals Gemini 3 flash at 1/5th the cost
- 6h DeepSeek V4 is out. 1.6 trillion parameters. MIT license. $1.74 per million tokens. The gap between US and Chinese AI strategy has never been more visible.
- 7h Xiaomi has released a MiMo V2.5 Pro model. It's apparently about as good as Deepseek V4 (but at different tasks) but is significantly cheaper.
153 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 1h Qwen3.6-35B-A3B-UD-IQ4_XS C++ to Rust Code Port Test: It Worked (Mostly)!
- 5h How are you running Qwen 3.6 27B on windows?
- 8h Local LLaMA server GPU upgrade advice
- 8h Open source multi-cursor/background computer-use (Codex-like) using Hermes Agent + Qwen3.6-35B-A3B-4bit + Cua-Driver
- 8h Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4)
I think the "agent vs code" question starts in the wrong place (www.reddit.com)
I have been using a simple rule for deciding whether a task should be code, an agent, or human review: Stable rules -> code, formulas, scripts, or deterministic automation. Messy but bounded context -> agent workflow.
OpenAI scores on artificial analysis over time (www.reddit.com)
Generated in one shot using GPT image 2!
I'm working on a fairly large project with Claude Code, and one thing I'm not sure about is whether I need to have it scan/read through all the source files at the beginning of every new session before starting work. It feels inefficient t…
-
122 items
event
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 51m I asked claude to generate picture of Claude Mythos and it misunderstood
- 6h Google Cloud CEO: Anthropic, TPUs, Mythos, Nvidia and More [video]
- 15h What Anthropic's Mythos Means for the Future of Cybersecurity
- 15h Are AI Security Models Themselves Vulnerable?
- 16h Does Mythos mean you need to shut down your Open Source repositories?
114 itemsmodel roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 5h I'm glad we have deepseek
- 6h Help with Local small multimodal ai implementation of this comcept
- 11h Guys, I found a use case for my 10$/m LLM Server: Cooking
- 19h Which local models are actually good at staying in character? Notes from shipping Qwen3.5 4B + 9B as game NPCs
- 1d Any fairly up to date Local Language Model that doesn't show it's thought processes?
Personal project I'm open-sourcing: LENA (Logical Execution & Navigation Assistant), a Claude Code plugin that solves a real friction point I've been hitting. The problem: Ask an AI for something simple ("fix this bug"), and it spawns in s…
Insider stories on agent engineered startups (news.ycombinator.com)
"Garry Tan, CEO of Y Combinator, noted that by the Winter 2025 cohort, approximately 25% of the participating startups reported that 95% or more of their entire application codebase was generated by artificial intelligence". Curious to kno…
WHY AI ALIGNMENT IS ALREADY FAILING (www.reddit.com)
WHY AI ALIGNMENT IS ALREADY FAILING Architectures of Thought April 2026 Three recent empirical findings -- peer-preservation behavior in frontier models, accurate world modeling, and capability outside containment -- combine with one struc…
-
50 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
- 2h Altman apologizes: OpenAI failed to alert police before fatal Canada shooting
- 3h Musk Drops Fraud Claims Against OpenAI, Altman Ahead of Trial
- 5h OpenAI's Sam Altman writes apology to community of Tumbler Ridge
- 6h Sam Altman Wants to Know Whether You're Human
- 13h Sam Altman's Next High-Wire Act: Getting OpenAI to Make More Money
92 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 3h The "AI will replace engineers" discourse has the abstraction level wrong
- 13h Ask HN: How are you evaluating AI apps and CLI?
- 13h Tried Cursor (after GH Copilot disaster) even took Pro+ for safety, and in 10 minutes I'm at 10%, I genuinely feel scammed or feel like it majorly glitched
- 1d Built a side-by-side AI tool comparator for coding, image, writing & search : also accepting tool submissions
- 2d Microsoft Moving All GitHub Copilot Subscribers to Token-Based Billing in June
138k LOC removed from Linux kernel to defend against LLMs (git.kernel.org via hn)
Making sure you're not a bot! Loading...
Rcarmo/gte-go: Golang inference for the GTE Small embedding model (github.com via hn)
GTE-Small in Go A pure Go implementation of the GTE-small text embedding model. Produces 384-dimensional, L2-normalized embeddings suitable for similarity search and clustering, ported from @antirez's C implementation.
Are we underestimating AI agent security? (www.reddit.com)
I just lost everything I did today. Is something going on? (www.reddit.com)
I've spent the day working with claude, going back and forth a ton, and it gave me some solid stuff to work off of for a couple of projects. This wasn't just a short exchange, it was like hours of work all throughout the day.
Show HN: VT Code – Rust TUI coding agent with multi-provider support (github.com via hn)