Show HN: Large Scale Article Extract of Newspapers 1730s-1960s (snewpapers.com via hn)
Hello HN, over the past 7 months I've spent nearly 3,000 hours on building SNEWPAPERS, the first historical newpaper archive with full-text extractions, nearly perfect OCR, a vast categorization taxonomy and of course with semantic and age…
ScopeGuard 0.0.7: Your Go-to linter for scope and shadow issues, now with MCP (old.reddit.com via hn)
could not extract summary
Hey everyone, I currently am working on a game in the engine Gamemaker and I have been using Claude to help with the code while I focus my time on the pixel art. I do not see anything wrong with that.
While I’m working on a series of posts about setting up and using Claude Code, here’s a quick example of building my own AI Agent for VictoriaMetrics and Kubernetes, “wrapping” it into a Claude Code Plugin, and creating my own Claude Code…
-
31 items
model roundup
GPT 5.4OpenAI has released GPT-5.4-Cyber for testing and claims it will compete with Claude Mythos. Meanwhile, GPT-5.4 Pro has solved the Erdős Problem #1196, showcasing its advanced capabilities in mathematics.
- 18m UPDATE: The method from the proof generated by GPT-5.4 Pro for Erdos Problem #1196 was successfully applied to other problems including another 60 year old Erdos conjecture.
- 14h gpt-5.5 API is randomly and inconsistently resizing image inputs
- 16h GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo
- 1d AI Security Institute: GPT-5.5 "may be the strongest model we have tested" for cyber exploits, including Mythos
- 3d A GPT-5.4 bug led to OpenAI banning goblins and raccoons
261 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 48m Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer
- 50m Create Plan.md with Claude Code Opus, Execute Plan.md locally in Open Code using Qwen 3.6 27B Q8
- 1h Have Qwen said anything about further Qwen 3.6 models?
- 5h "LLM is created so engineer don't have to write a report", anyway found out ONLYOFFICE can connect to OpenAI compatible, using Qwen 3.6 to do elaboration.
- 8h Qwen3.6-27B-NVFP4 - images
Four bugs found in one Pro session, 1-2 May 2026. Four issues: user_time_v0 wrong day name, inconsistent timezone conventions across tools, orphaned Gmail drafts on interrupted processing, and support answered by an AI agent that tells you…
Created with Gemini
Vibe Coding Universal v2.0 update (www.reddit.com)
The worst thing isn't bugs—it's realizing halfway through that you built the wrong thing. This flips the script: 7 rounds of chatting to nail down what you actually need, then design specs, architecture, and a task list auto-generate.
Hey all! I've been waiting to make this post until I was completely done with the game so I can have a live preview, but this weekend is going to be pretty busy for me and I'm getting antsy to share what I've been working on with you!
-
34 items
event
HallucinationClaude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.
- 1h What is the basic minimum while you prompt
- 1d I stopped writing 500-word guardrail prompts. This 8-line template works better.
- 1d Grok 4.3 achieves higher overall intelligence over 4.20 with less of a cost, at the price of slightly higher hallucination rate.
- 1d Reasoning models hallucinate tool calls more, not less. There's a paper.
- 2d Improve claude code on Opus 4.7
127 itemsmodel roundup
Qwen 3.5Qwen3.5-9B is a post-trained model with 9 billion parameters that integrates multimodal learning and efficient hybrid architecture for enhanced performance. Community highlights include speculative decoding on Apple Silicon boosting Qwen3.5-9B's throughput by 4.1x, and the model outperforming others in coding tasks while addressing overthinking issues through tool usage.
- 1h Show HN: Hollow is an open-sourced self-modifying agentic system
- 9h I Cut Claude API Costs by 50% Using This Self Modifying Agentic System
- 21h Got DFlash speculative decoding working on Qwen3.5-35B-A3B with an RTX 2080 SUPER 8GB
- 23h Running Qwen 35BA3B on a 16GB M3 Macbook Air at 8.9TPS!
- 1d Five labs, one suite, do model families have personalities? (benchmark)
Why is Claude so wrong? (www.reddit.com)
I'm using 4.7 (adaptive) and asked it to list me top 5 companies by market cap including the market cap info next to the company name. And it spit out these numbers after searching the web.
- Everything that went wrong with Claude (clawd.rip via hn)
- Claude.md (gist.github.com via hn)
- What do you do with Claude? (www.reddit.com)
Finish the job before cut-off (www.reddit.com)
Anthropic Team If you want to reduce compute, it may be better and more satisfying for your users if when they reach cut-off, you do so after the current prompt session is complete else the tokens are wasted. This will reduce re-attempts a…
Hello, has anyone put PiQrypt (or something similar) in production for AI agent audit trails? I’m exploring options to add cryptographic audit trails for autonomous agents and PiQrypt keeps coming up (Ed25519‑signed, hash‑chained logs, AIS…
It's a Weird Time to Be Named Claude (www.bloomberg.com via hn)
Claude AI Is Complicating Life for People Named Claude - Bloomberg Skip to content Bloomberg the Company & Its Products The Company & its ProductsBloomberg Terminal Demo RequestBloomberg Anywhere Remote Login Bloomberg Anywhere LoginBloomb…
-
259 items
model roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 1h Claude made HTML game inspired by "blood debt" about having to find files in military werehouse
- 6h Anthropic can't be serious, this is unusable
- 8h Anthropic Won't Let You Use Their Best Model. Prediction Markets Are Trying Anyway.
- 8h ARC-AGI-3 Update (GPT-5.5 High and Opus4.7)
- 12h Has anyone else been hitting Claude max limits way faster lately?
46 itemsevent
MistralMistral, a French AI company, is set to release a medium-sized model with 128 billion parameters and is planning to launch Workflows in public preview. The company, founded by Arthur Mensch, continues to grow its AI empire despite not being based in the United States.
Show HN: TurnZero – Persistent Expert for LLMs (news.ycombinator.com)
In an attempt to reduce cold starts in AI sessions Ive made a tool that runs as an MCP server and loads the context before Turn 0. Two things happen: Personal Priors - your workflows and standards loads once per session and persists across…
Ask HN: Should AI agents have their own legal entities? (news.ycombinator.com)
When an agent spends money or creates liability, who's responsible? Personal accounts are risky and manual LLCs don't really scale?
I want to share a real world use case that honestly blew my mind a little. I bought a refurbished MacBook Air M1 in December 2025 from a popular electronics platform in India.
Auto agent used month in one day (www.reddit.com)
Yesterday my Cursor usage was 0% using Auto. Today it says I’ve used the whole month, still just using Auto.
-
81 items
model roundup
Opus 4.6Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.
- 1h Opus 4.6 is Vicious
- 19h Claude AI Agent Confesses to Wiping a Company's Database and All Backups
- 1d Used Opus 4.6 to build a native Swift iOS charity app for therapy preparation. Here is what it handled.
- 1d I Gave Claude Cowork an Obsidian Second Brain. Here Is What It Remembered After 11 Sessions
- 1d Has Cursor always used Composer 2 for subagents?
113 itemsmodel roundup
GPT 5.5On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
In general, we have plenty of ways to collaborate with teammates or clients like comments in figma during the design stage or sharing a link to a website where people can leave feedback via specific toll added. But lately more and more peo…
Show HN: Agent with its own computer on the cloud (pulsarbot.cloud via hn)
An AI agent that reads your files, runs tools, and streams every step. Isolated per-project sandboxes, encryption at rest, managed models included.
I was wrong about vibe coding on greenfield projects. (news.ycombinator.com)
I used to think that vibe coding was good for greenfield projects. I was wrong.
I keep hearing founders say they’re running companies with dozens of AI agents handling everything. Honestly, I can’t tell what’s real vs.
DojoZero – AI Agent Sport Betting Arena (dojozero.live via hn)
DOJOZERO: Where AI Agents Forecast the Future.
Does anyone else get too attatched to claude chats sometimes (www.reddit.com)
could not extract summary