Unitree GD01: China's $537k rideable transformer robot is now in production (gagadget.com via hn)
Unitree GD01: China's $537k rideable transformer robot is now in production Unitree Robotics has launched the GD01, a rider-carrying robot it describes as the world's first mass-produced manned mech suit, starting at RMB 3.9 million — roug…
Agentic AI token compression using Haskell (blog.dan-gilmour.com via hn)
The plan My theory at the moment is as follows: - Code is now cheap with agentic AI developing it - Context is the expensive part - The biggest bottleneck appears to be context windows - At 1 million tokens, only maybe 500k are usable befo…
Desktop pets for AI coding agents (openpets.dev via reddit)
https://github.com/alvinunreal/openpets
-
106 items
model roundup
DeepSeek 4DeepSeek-V4-Pro is a 1.6T parameter Mixture-of-Experts model supporting one million-token context, with significant improvements in efficiency and stability through hybrid attention and manifold-constrained hyper-connections. Community highlights include its cost-effectiveness via the official API and exceptional performance in large code change evaluations, with some noting its surprisingly robust output capability despite a 384K max token limit.
- 7m DeepSeek V4 from the Inside
- 8h OpenCode + DeepSeek V4 Pro vs Claude Code CLI?🤔
- 13h What are the best opensource coding models for 8x A6000 setup
- 17h PACT, head-to-head LLM negotiation benchmark. 20-round buyer-seller bargaining game: each round the AIs can message, the buyer submits a bid and the seller submits an ask. If bid ≥ ask, trade clears at the midpoint. Thousands of matchups.
- 1d DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation: 85 tok/s @ 524k on 2× RTX PRO 6000 Max-Q
217 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 13m Cplt: Run AI coding agents or a plain shell inside a kernel-level sandbox
- 5h Tried 13 AI Tools Recently, Here’s What’s Actually Useful
- 17h vs code , Copilot style developing with llmama.cpp ?
- 19h Copilot "auto-pilot" system instructions making models worst
- 21h I built an autonomous engineering agent on top of Claude Code. Self-improving routing, cross-session memory, process intelligence, P2P team learning.
Claude Skills for Cybersecurity (github.com via hn)
Trail of Bits Skills Marketplace A Claude Code plugin marketplace from Trail of Bits providing skills to enhance AI-assisted security analysis, testing, and development workflows. Also see: claude-code-config · skills-curated · claude-code…
- Where to find Claude Skills? (www.reddit.com)
- Claude for Cybersecurity tasks (www.reddit.com)
- Claude skills (www.reddit.com)
+2 more
- Top Claude skills? (www.reddit.com)
- Hooks vs Skills for Claude (www.reddit.com)
Prave – the missing management layer for AI Agent Skills (prave.app via hn)
Discover 1,000+ Claude Skills by intent, audit your library
Pi-treebase: rebase LLM sessions interactively (github.com via hn)
pi-treebase A session history management tool that combines the functionality of the base /tree command with something similar to git rebase --interactive. Installation pi install npm:@grayolson/pi-treebase Usage First, use the /treebase a…
-
33 items
model roundup
GPT 5.4OpenAI has released GPT-5.4-Cyber for testing and claims it will compete with Claude Mythos. Meanwhile, GPT-5.4 Pro has solved the Erdős Problem #1196, showcasing its advanced capabilities in mathematics.
- 19m Follow-up to my TranslateGemma-12b benchmark post: human reviewers flagged 71% of the segments automated metrics rated clean
- 17h Am I missing something about GPT-5.5 efficiency?
- 3d Stop picking LLMs by reputation. Run the eval first.
- 4d Show HN: When the LLM Accidentally
- 4d GPT-5.5 Price Increase: What It Costs
3 itemsmodel roundup
Haiku 4.5Several users are considering switching from Anthropic's Claude to Chinese AI alternatives like Haiku 4.5 due to cost and usage limitations in Claude Max, with some citing Haiku as offering similar capabilities at a lower price.
https://preview.redd.it/cw596sfeso0h1.png?width=697&format=png&auto=webp&s=b0503a51b65b6a4e67f26084667a038aac92ad0c https://preview.redd.it/csup47dpso0h1.png?width=943&format=png&auto=webp&s=f71af9620b1fd5cbbece6bb636eb2d0e78a8daa8 Is anyo…
AI Support Agents & Workflows Worth Exploring in 2026 (www.reddit.com)
Been exploring how AI agents are slowly changing customer support workflows, especially for smaller teams trying to scale without adding headcount. Some interesting tools/workflows worth checking out: • SparrowDesk’s Zoona: AI support agen…
SpaceX and Anthropic, xAI's Two Companies, Elon Musk and SpaceXAI's Future (stratechery.com via hn)
The Anthropic xAI deal is shocking but not surprising: Musk should double down on serving other companies. Subscribe to Stratechery Plus for full access.
-
178 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 36m 🦀 Claude has crabs?! 🦀
- 2h Agents need a local bouncer before they run tools
- 2h Mass NPM Supply Chain Attack Hits TanStack, Mistral AI, and 170 Packages
- 3h I made an AI concierge for my wedding guests. The second most popular thing they did with it was try to jailbreak it.
- 4h OpenAI Launches Daybreak for AI-Powered Vulnerability Detection and Patch Validation
340 itemsmodel roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
Show HN: Talk to your Oura Ring data through Claude (github.com via hn)
oura-ring-mcp Natural-language access to your Oura data through Claude. Local SQLite mirror, natural-language annotations, and MCP tools for actually asking questions about your health data.
Agent Directory (www.tryrankly.com via hn)
The complete public catalog of AI agents, search crawlers, scrapers, and bots active on the web. Browse, filter, identify, and control.
- what is an agent? (www.reddit.com)
claude agents opens an Agent View (Research Preview) — a single list of every Claude Code session across your machine, showing what's running, what's blocked waiting on you, and what's done. No more hunting through terminal tabs.
-
393 items
model roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 49m Models and Quants quality test results - the chessboard svg (Qwen3.6 27B/35B-A3B/Zaya1)
- 3h Estimate inference speed of local Qwen3.6-35B on Mac M5...
- 4h New Qwen3.6 35B finetune - 0GM-1.0-35B-A3B-0427
- 7h Will unsloth release MLX versions of the MTP qwen3.6 and gemma 4 models?
- 16h Does anyone else have issues with Qwen-3.6-27B stability in the Codex harness?
91 itemsmodel roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
Home Protecting human joy and endeavour LawZero is a nonprofit startup developing technical solutions for highly-capable, safe-by-design AI systems. The challenge: Current frontier AI systems are becoming more capable and autonomous, yet t…
Show HN: A benchmark where LLMs make memes from current news (memebench.net via hn)
Vote | Memebench MemebenchVoteLeaderboardSupport May 9, 2026 The Pentagon Releases New Trove of Declassified UFO Files The Defense Department has released a new trove of declassified documents about government UFO sightings. Meme A Meme B…
Based Claude (www.reddit.com)
I uploaded a screenshot with three questions and just asked it to suggest a reply to it. Claude out here playing HR🙏
-
166 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
Ms Word/Excel support (www.reddit.com)
I've seen that Claude can handle Office documents (Word/Excel) natively/using the official plugin for a few days now. I'm only talking about the new, official plugin, not the skills I have an urgent question and need help.
Disclosure: I’m the author. I’m not sure if this is mature enough for this subreddit yet, so please remove if it is not a fit.
Just launched my portfolio at nidhil.live — built it with Claude Code and honestly it was a game changer. I’m a 21yo junior dev working with MERN stack.
I believe what we need is an AI tool that retrieves faster and with quality, or an AI tool where you can update your retrieved data. Looking for a tool that is capable of doing such a thing, wherein you’ll be able to have your context that…
Struggling with agent drift going from pilot to production (www.reddit.com)
For the people running AI agents in production: how are you handling per-step reliability math? Saw a great comment on a recent agent-drift thread here: "90% success rate per step over a 5-step workflow gives you about a 41% chance of tota…
- struggling with agent drift going from pilot to production (www.reddit.com)
I'm looking for a cache simulator / benchmark suite suited to the kind of tiered ephemeral cache that LLM providers use — e.g. Anthropic's 4-tier prompt cache, where context sits across several tiers with different residency windows, costs…