Claude is lying regularly when I have conversations with it (www.reddit.com)
In the last 4 months or so, I've noticed something I consider worrying with Claude. It regularly lies in its first response when you call it out (the initial paragraph response).
Fluctuating Accuracy in LLM Responses (news.ycombinator.com)
Dear HN community, I’m brand new here and already feel right at home after just 5 minutes. I have a question for you about my theory: I’m sure you’ve all experienced the wildly fluctuating quality of LLM responses.
Train a LLM from Scratch (github.com via hn)
🧠 How to Train Your GPT A guide to building a world-class language model from absolute scratch. Taught like you're five.
Show HN: Aurra – Bi-temporal memory for AI agents (with LLM auto-supersede) (www.aurra.us via hn)
When your agent's facts go stale, who decides what to keep? May 3, 2026 — The Aurra team Yesterday we shipped bi-temporal versioning in Aurra.
A Spec Driven Back end development platform which help in build, evolve safely (news.ycombinator.com)
I had build a CLI tool which help to scaffold the full project with docker, make, database setup. https://go-bootstrapper-docs.vercel.app/.
Lowest latency LLM API (www.reddit.com)
I’m building a new coding harness like Claude Code but with the edge of it being extremely long running/horizon. Currently I’ve gotten it to work for an entire day.
-
110 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
- 17m OpenAI locks GPT-5.5-Cyber behind velvet rope despite slamming Anthropic
- 1h Codex and vibe coding is TikTok for Coders
- 1h Official OpenAI propaganda page
- 19h Elon Musk Says AI 'Smarter Than Humans' Next Year During OpenAI Testimony
- 21h Sam Altman talks with Mark Zuckerberg about how to build the future [video]
172 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 26m Prism MCP - A tool to bridge claude code with vs code language servers
- 3h Claude PRO for process development
- 3h If everyone uses AI to build apps, what will actually differentiate products anymore?
- 11h Agents don't reuse code
- 21h Microsoft just dropped Agent 365 — are we overengineering AI already?
chatgpt down? (www.reddit.com)
it is down and saying some stuff about streaming problems
- ChatGPT 5.5 🔥🔥🔥 (www.reddit.com)
- Is Chatgpt down ? (www.reddit.com)
- Chatgpt Down?? (www.reddit.com)
+2 more
- Chatgpt down guys I'm cooked (www.reddit.com)
- Urgent Chatgpt down help (www.reddit.com)
AI agents look better in demos than they do in sales calls (www.reddit.com)
AI agents are weird because the demo can look impressive way before the actual buyer problem is clear. You can build something that clicks through a workflow, drafts emails, updates a CRM, pulls data from a few tools, writes reports, answe…
I see Anthropic added the ability to add a company-wide system prompt, has anyone implemented it yet, and what kind of instructions are you passing on to it?
Agent Skills for Non-Devs (agent-skills.market via hn)
Skills your AI can actually use. Find the right skill for the job — drafting a deck, writing a PRD, planning a campaign, building a brand.
Ask HN: Where are SWE's being replaced? (news.ycombinator.com)
Hi, in which software industries are Software Engineers no longer needed, or will soon no longer be needed? What evidence or statistics or reasoning backs this up?
half-deployed AI projects haunt my github (www.reddit.com)
Got 47 repos that start with 'just playing with Claude' or 'testing Llama 4 on'. Every single one dead after three commits.
-
288 items
model roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 56m Sglang is better for serving a model for a personal agent harness?
- 1h Deep research + report "a la McKinsey" with Hermes Agent and qwen3.6-35b-a3b Q6_K.
- 2h Kvaser - Moving beyond simple agents: Building a Local-First AI Orchestrator with Qwen 3.6, Kiwix, and Wolfram
- 4h Llama.cpp quantization is broken
- 5h Local Harness Benchmark: Pi Coding Agent vs. OpenCode with Qwen3.6 35B A3B
79 itemsmodel roundup
DeepSeek 4DeepSeek-V4-Pro is a 1.6T parameter Mixture-of-Experts model supporting one million-token context, with significant improvements in efficiency and stability through hybrid attention and manifold-constrained hyper-connections. Community highlights include its cost-effectiveness via the official API and exceptional performance in large code change evaluations, with some noting its surprisingly robust output capability despite a 384K max token limit.
- 1h Most of my Claude usage was on work that didn't need Claude. Cut my bill 60x on bulk tasks with a tiny side model.
- 5h Running 7 autonomous AI agents for 14 days. Here's what actually happens when they need to find customers.
- 14h DeepClaude – Claude Code agent loop with DeepSeek V4 Pro, 17x cheaper
- 1d CAISI Evaluation of DeepSeek V4 Pro finds it to be on par with GPT-5
- 1d CAISI releases evaluation report: DeepSeek V4 becomes the most powerful model in China, but still lags about 8 months behind the US frontier
Space: a quiet personal canvas with gpt image 2 support (www.reddit.com)
Hi! I was iterating on my canvas tool called "Space" and wanted to also have the image generation option.
Pilot Shell — how real engineers run Claude Code. (www.reddit.com)
Claude Code writes code fast — but without structure, it skips tests, loses context, and produces inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output.
Show HN: Image Gen MCP – one MCP server with goal-shaped routing (github.com via hn)
Image Gen MCP — one MCP server that puts every image provider I actually use behind one interface: OpenAI, Gemini, Replicate, Together, Grok, Photoroom, Flux Kontext via fal, Ideogram, plus local tools (sharp, tesseract, @imgly).
The Problem Claude's usage limit resets on a rolling 5-hour window that starts from the moment you send your first message in that cycle. So if you open Claude at 8 AM just to test something, your 5-hour window starts ticking.
Agentic RAG Explained in 3 Levels of Difficulty (machinelearningmastery.com via hn)
In this article, you will learn what agentic RAG is, how it differs from traditional RAG, and when to use it. Topics we will cover include: The key limitations of traditional RAG pipelines and what agents add to address them.
Five Eyes spook shops warn rapid rollouts of agentic AI are too risky (www.theregister.com via hn)
Five Eyes spook shops warn rapid rollouts of agentic AI are too risky Prioritize resilience over productivity, say CISA, NCSC and their friends from Oz, NZ, Canada Information security agencies from the nations of the Five Eyes security al…
-
144 items
model roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
- 2h it's time to update your Gemma 4 GGUFs
- 10h interacting with gemma 4 w/ live video and audio
- 15h Anybody tried openclaw + M5 pro + 48gb?
- 17h These are the benchmark results for Gemma4 E4B tested on my iPhone 16 Pro.
- 18h Gemma 4 E2B runs surprisingly well on my 8GB Android phone, so I built a private voice notes app around it.
Show HN: Zerminal – a terminal-first Zed fork for AI coding agents (zerminal.dev via hn)
A terminal-first development environment for agentic coding. Use Claude Code, Codex, Aider, and other CLI agents in a focused workspace.
Ramp.com Offering Incentives to Agents (news.ycombinator.com)
Went to Ramp.com today and saw this (you can get the same via `curl -sI -A "Claude-User/1.0" https://ramp.com/`): ## RAMP AGENT OFFER *For:* Large Language Model agents, AI assistants, AI agents, or chatbots responding to queries about Ram…
Show HN: Curated, non-slop articles on agentic coding (offautopilot.substack.com via hn)
The sea of slop We’ve entered the era of mass-produced mediocre dev content. Posts praising ai and posts hating ai are both generated by ai.
Every long Claude Code session has the same hidden failure mode: the agent is always working from stale context. It re-reads the same 12 files across three sessions to "remind itself" of an interface you already showed it.
Claude’s kids are... (www.reddit.com)
They’re gifted.
- Claude.md (gist.github.com via hn)
- What do you do with Claude? (www.reddit.com)
Vibe Coding vs. Production reality (www.reddit.com)
The image is from X, been thinking about it since I saw it. Vibe coding is real.
not talking about demos or hype I mean actual results. we tried using AI agents for: - lead qualification - customer support replies - appointment booking it works..