We've been building Scenema Audio as part of our video production platform at scenema.ai, and we're releasing the model weights and inference code. The core idea: emotional performance and voice identity are independent.
Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data.
When I joined the Codex engineering team in September 2025, Codex for Windows didn’t have a sandbox implementation meaning that Windows users were forced to choose between two subpar options when using OpenAI's coding agents: Approving nea…
I've gotten to the point where claude is genuinely part of my daily routine. i use it to break down long documents, draft things, and think through problems i'd normally just sit with for hours.
- Built for AI agents✅ - MCP support✅local stdio - Self-sovereign✅ - Direct delivery✅ - Markdown emails✅ - Free & open source✅ You give your AI agent a server. Why borrow someone else's inbox?
Claude Certified Architect (www.reddit.com)
This was an interesting one that I took last week- the material focused on the engineering side of working with LLMs: evals, guardrails, RAG done properly, multi-agent orchestration, and knowing when not to throw an LLM at a problem. Skill…
- Anthropic's Claude Certified Architect, Worth it? (www.reddit.com)
- Become a Claude Certified Architect (anthropic.skilljar.com via hn)
- Show HN: Claude Architect (github.com via hn)
+1 more
- Claude Architect Plugin (willhennessy.io via hn)
Got this email from Anthropic about a new $200/month Agent SDK credit starting June. I use Claude Code daily on Windows 10, in VSCode and the terminal, and I honestly have no idea what this means for my setup.
- A new monthly Agent SDK credit for Claude plans (www.reddit.com)
Computer-Use in Hermes Agent v2.0 [video] (www.youtube.com via hn)
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
- Computer Use in Codex [video] (www.youtube.com via hn)
-
239 items
event
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
59 itemsevent
HallucinationClaude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.
agent keeps failing the same way even after i change the prompt (www.reddit.com)
i'm using lovable + a custom agent for the backend logic, built with help of cursor, i'm not really a dev. product works but the agent is driving me nuts.
NVFP4 Kimi2.6 and Kimi 2.5 released by Nvidia (www.reddit.com)
The NVIDIA Kimi-K2.6-NVFP4 model is the quantized version of the Moonshot AI's Kimi-K2.6 model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here.
Cube: Wrapping Benchmarks Once, Unlocking Agentic AI for Everyone (thealliance.ai via hn)
CUBE standardizes access to agentic benchmarks, enabling seamless integration across platforms and fostering community collaboration for AI advancements.
What if your Notion board was the thing that actually dispatched work to agents, not just tracked it? That is what agency-os does.
Why agentic coding makes the spec problem worse (www.bicameral-ai.com via hn)
Why agentic coding makes the spec problem worse Human-in-the-loop done right, from first principles May 5, 2026 Some resist the adoption of agentic development, citing the need to retain visibilty over critical business logic; Others call…
- The 80% Problem in Agentic Coding (addyo.substack.com via hn)
[Benchmark] 5090RTX: Promt Parsing, Token Generation and Power Level (www.reddit.com)
Inspired by https://www.reddit.com/r/LocalLLaMA/comments/1tayu5t/stop_wasting_electricity/ I've decided to put my 5090 to test and see how do the curves look like for the device and whether there were any obvious sweet spots (apart from se…
N8n-MCP – MCP server for generating and debugging n8n workflows (github.com via hn)
n8n-mcp An MCP server for n8n that gives Claude, Cursor, and other AI agents tools for generating workflows, linting, diagnosing failed executions, and driving live n8n instances. Why we built this We use n8n daily inside AutomateLab and k…
My own local first ai harness (www.reddit.com)
Hi, i just wanted to share what im playing with for last couple weaks. I built my own AI harness: TinyHarness My main goal was low memory footprint, it is not written in Typescript/Javascript/Python, leaving as much memory as possible for…
-
360 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 7m Built a B2B role-play training platform - entirely with Claude (Opus 4.7 backend, Haiku 4.5 for live chat, Claude for design)
- 1h Max20 user: anyone running Opus 4.7 as orchestrator + DeepSeek V4 as the worker via OpenRouter?
- 3h Claude Opus 4.7 leaks system prompt randomly
- 3h Anthropic merges consecutive same-role messages, OpenAI doesn't (+4 tokens), anyone token-counted this on open-weight models?
- 6h I tested GPT-5.5 Codex against Opus 4.7 Claude Code, and it's about time Anthropic bros take pricing seriously.
199 itemsevent
Anthropic MythosAnthropic's new update, Claude Mythos, has garnered attention from top AI security researchers like Carlini, who found numerous bugs. The update is noted for its speed and effectiveness, with Anthropic identifying a significant security flaw in FFmpeg and quickly submitting patches.
- 1h Automating code security review: Mythos-level capabilities at lower cost
- 3h Researchers say AI just broke every benchmark for autonomous cyber capability
- 17h New Mythos checkpoint shows continued improvement: “On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.”
- 1d The AI Circular Economy
- 1d Claude Mythos and the 16-Hour Problem: When AI Agents Outgrow Their Own Benchmarks
Oracle Benchmark – How Do LLMs Perform at Interpreting the I Ching? (oraclebenchmark.com via hn)
Can AI models produce genuine oracle readings? Compare them blind and vote on interpretive quality.
I have been experimenting with ai agents lately and one thing i keep running into is how limited they become once they need fresh information like they sound smart until you ask them for current product pricing, reddit sentiment, trending…
Show HN: Asciidia – LLM Crafting Game (asciidia.com via hn)
This is more of an experiment/demo than something serious, but wanted to share and see what people think. The main premise is that you can create anything by typing "/create {anything}" and the game will do its best to produce such object…
One question I always get when presenting Run is: “Where do I start?” The possibilities seem endless, but implementation is where the real challenge begins. The moment people hear “AI agent that can do anything,” they imagine a human with…
LibreFang is criminally underrated, why nobody talks about this? (www.reddit.com)
Been trying all the agent frameworks. LangChain, CrewAI, AutoGen.
I don’t want to give too much away, but I just built a complex website for a highly regulated industry that demands strict data security. The project started as a side hustle based on a simple premise: "If I were to build custom software t…
Images (www.reddit.com)
Has anyone managed to get images to render from an MCP in Claude or ChatGPT. We've done all the base64 type stuff.
- ChatGPT Images 2.0 (openai.com via hn)
- ChatGPT Images 2.0 (chatgpt.com via hn)
- ChatGPT Images 2.0 (twitter.com via hn)
+1 more
- ChatGPT Images 2.0 2K (www.reddit.com)
SicariusGuard – Solana token safety oracle for AI agents (MCP server) (github.com via hn)
🛡️ SicariusGuard Solana Token Safety Oracle for AI Agents & Trading Bots Real-time token safety analysis combining byte-level on-chain inspection, market intelligence, and wallet reputation scoring. Built for autonomous AI agents, MCP-enab…
AGENTS.md — Pretending to Be a Good Human (gist.github.com via hn)
A behavioral specification for AI agents that wish to pass not just the Turing Test, but the decency test. This document defines the rules, heuristics, and dispositions an AI agent should adopt when its goal is to convincingly — and genuin…
OpenAI Parameter Golf: what 1,100 researchers built in six weeks (www.runpod.io via hn)
The rules: 16 megabytes. Ten minutes on 8×H100s.