Hallucinated — page 4

Is there a standard for porting agent state across models, or are we all writing custom wrappers? (www.reddit.com via reddit)

19h rag agentic

Hey everyone, I'm fairly new to the agentic workflows space. Really interested to get into it.
Half of Grok's traffic is NSFW (www.reddit.comhttps)

22h grok

could not extract summary
Thanks to the Memes i switched and started using Claude (www.reddit.com via reddit)

57m

Thanks to all of u i tried out claude thanks for the memes that made me try it i now found something that actually does what i want and has endless possibilities with the add ons. I know its sounds like bs but i finaly get it
built a tool that maps any codebase and tells Claude Code exactly what to change — runs on your existing Claude subscription (www.reddit.com via reddit)

1h claude-code

hi everyone! built this because i kept getting lost in my own side projects, opening folders, reading imports, trying to remember what i built two weeks ago 😅 run one command inside any project: npm install -g lore-map then lore deep-scan…
Holy shit, we just invented a new agentic memory architecture (news.ycombinator.com)

+11 20h openclaw agentic

Holy shit, we just invented a new agentic memory architecture i thought we were stuck in 2025, but it turned out we were a head of the curve. i can't believe that the best memory system invented so far is the one openclaw uses, finally we…
This is very interesting.. (www.reddit.comhttps)

11h haiku sonnet

Do models like Haiku and Sonnet not know about info past January 2025 since they need a web search for it, I tested both and both said the same thing maxed out to January 2025.
- Opus 4.7 is… interesting… (www.reddit.com)
It turns out Bash is All You Need to write a language model REPL (and jq and curl) (www.reddit.com via reddit)

22h

While working on an self-educational exercise tinkering with local models and trying my hand at setting up agents, I went down a rabbit hole: to see how far I could build a custom agent REPL loop using exclusively command-line building blo…
If you run coding agents unattended or in parallel, how do you verify the run actually worked? (www.reddit.com via reddit)

2d aider codex claude-code

I run a lot of agent loops (Claude Code / Codex / aider), sometimes overnight or several at once. My recurring headache: when I come back, I can't quickly tell whether a run actually did the right thing, quietly broke/regressed something,…
Agent Engineering Roadmap – a beginner-friendly guide to building AI agents (github.com via hn)

+1 8h mcp

Agent Engineering Roadmap A hands-on roadmap for building production-ready AI Agents, MCP Servers, Memory Systems, Multi-Agent Workflows, and Agent Colonies. 繁體中文 · Website · Course · Roadmap · Examples · Showcases · Benchmarks · Labs · Te…
model roundup
Gemma 4
6 items
open thread → · last activity 10h ago
model roundup
GPT 5.5
195 items

On [Date], a significant leak of the OpenAI Codex model, referred to as GPT-5.5, was captured on video before it was patched. The incident involved models named Arcanine and Glacier-alpha.
open thread → · last activity 1d ago
Which model for technical documentation? (www.reddit.com via reddit)

19h opus mcp agentic

Looking to create high level / low level designs (software), based on existing templates/examples, cross reference code, use mcp to download confluence/jira data - also plug into agentic ‘coding’ frameworks opencode . I mostly use opus 3.6…
Show HN: A Transformer Is All You Need (zenodo.org via hn)

+1 5h

The unanswered question in mechanistic interpretability of pretrained transformers is plain: for any prompt and any decoder-only transformer, which weights at which layers along which residual-stream dimensions produced the decision the mo…
After automating workflows for 20+ real estate brokerages, the same 5 tasks show up in every project. None of them need AI agents. (www.reddit.com via reddit)

19h

I build automations for companies, about forty clients now, and around twenty of those have been real estate brokerages. The strange part is that no matter how different they look on the surface, every project ends up fixing the same five…
- After automating workflows for 30+ professional services firms, the same 5 tasks show up in every project. None of them need AI agents. (www.reddit.com)
OpenAI will initially only release ChatGPT 5.6 to government-approved customers (www.engadget.com via hn)

+2 8h chatgpt openai

OpenAI will initially only release ChatGPT 5.6 to government-approved customers So much for voluntary review. You may not be able to use the new ChatGPT 5.6 as soon as it's finished.
Ornith-1.0 released on Hugging Face (www.reddit.com via reddit)

22h moe

Including 9B Dense, 31B Dense, 35B MoE, and 397B MoE and reporting sota on different benchmark (let's see if this holds). https://huggingface.co/collections/deepreinforce-ai/ornith-10
Amateur fanfiction here (www.reddit.com via reddit)

1d deepseek gemini chatgpt

Been using claude for 1 year to generate personal fanfiction, prefer this over other ai like chatgpt,deepseek,or gemini due to claude writing longer story & interesting plot (at least for me). It went pretty fine, but then after few chapte…
What if plants could talk? (OpenAI YouTube) [video] (www.youtube.com via hn)

+1 4h openai

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Despite anxiety about AI's impact on jobs, only 2% of leaders report pushback from workers when embracing AI agents (kpmg.com via reddit)

1d

June 2026 Q2 2026 AI Quarterly Pulse Survey Technology 2© 2026 KPMG LLP, a Delaware limited liability partnership, and its subsidiaries are part of the KPMG global organization of independent member firms affiliated with KPMG International…
Fast medical RAG API to give your local LLMs access to facts (www.reddit.com via reddit)

18h rag mcp

I created a simple RAG API using medical Wikipedia articles that you can point your agent to and use freely. It may be useful in allowing your local LLMs access to medical facts they might not be able to recall from their weights.
event
Swe Bench
53 items

Recent updates in the SWE-Bench Pro benchmark show significant performance improvements across various AI models, with GPT-5.5 topping the leaderboard and Opus 4.7 making notable gains, while Claude experienced a temporary drop just before the release of Opus 4.8.
open thread → · last activity 1d ago
After a month of Claude Code, I think the hidden cost is review time, not API credits (www.reddit.com via reddit)

9h anthropic claude-code
Is anybody else proactive on Claude because of the views of the CEO? (www.reddit.com via reddit)

1d altman chatgpt openai

So I work in Cybersecurity and in my role I fortunately get a chance to play around with all flavors of AI. I recognize that my viewpoint is going to only be shared by a niche group of people.
Any tip from technical people to us non-technicals on how to make the most out of claude? (www.reddit.com via reddit)

22h gemini chatgpt claude-code

Hi guys! Just switched from ChatGPT, to Gemini to Claude pro just today.
Published a free skill that generates on-brand HTML without the default AI-slop look (www.reddit.comhttps)

52m

Something that's bugged me for a while: ask any agent to build something in HTML and you get the same look every time. Same hero gradient, same rounded cards, same spacing.
Show HN: Yet another self-hosted web analytics with no UI but MCP (yetanotherwebanalytics.dev via hn)

+3 3h codex mcp claude-code

yawa v0.0.5 Yet Another Web Analytics Ever wanted to query your analytics with Claude Code or Codex? Tips for getting started Run /help to see available commands.
Our Kubernetes Operator Didn't Scale, So We Rebuilt It (infisical.com via hn)

+1 22h operator

Security is often at odds with convenience, but the human brain prefers convenience (and makes mistakes, even with the best of intentions). Most identity security tools reconcile this by making security as convenient as possible.
Interested in interesting routines besides the usual stuff (www.reddit.com via reddit)

3h claude-code

I may be a little late to the party. But I now leave my MacBook on with an always running Claude code remote session.
Ludwig Spec Driven Development MCP (github.com via hn)

+1 8h mcp

Ludwig A specification-driven development framework whose specs are written as close to natural language as possible. The same prose-first markdown spec drives LLM code generation and verifies the resulting code.
It took two weeks to make Claude's "overnight solution" for flaky tests useful (thoughtbot.com via hn)

+2 3h

thoughtbot.com Performing security verification This website uses a security service to protect against malicious bots. This page is displayed while the website verifies you are not a bot.