LLMs Corrupt Your Documents When You Delegate (arxiv.org via hn)
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the…
Plugins Confuse me (www.reddit.com)
Hey everyone, plugins confuse me a lot. If anyone uses them outside of just their manually put together MCP servers, I would love to learn the technical difference between them.
Multi-Agent AI Systems Are Eating Single Agents (aistackinsights.ai via hn)
Single-agent architectures hit a wall the moment your task needs planning, research, and execution in parallel. Multi-agent systems solve this — but most tutorials skip the hard parts.
**Claude.ai MCP connectors seem to be silently degrading — Google Drive broken, Gmail now only reads metadata. Anyone else?** I use Claude as a personal finance assistant.
I’m on the Pro plan, and I’m thinking about using Claude more—both to explore new possibilities and to automate certain processes. I’d really like to hear about how you use it in your day-to-day work.
Open-Source Inference is growing 10% week over week this year (news.ycombinator.com)
So we're a small inference provider, launched publicly two weeks ago and have seen a crazy demand of growth. I reached out to a lot of other inference providers such as fireworks, togetherAI, simpliAI etc and started asking them their grow…
The AI alignment problem. ( via reddit)
could not extract summary
- The Alignment Problem in Your Government (kunnas.com via hn)
Bibloteca de anuncios , claude (www.reddit.com)
Ultimamente estaba buscando anuncios y me esta dando errores, me pregunta todo el rato antes de buscar o me pide permiso todo el tiempo, cosa que antes no pasaba, a alguien mas le esta pasando o es un error de alguna configuracion mia? A a…
-
84 items
event
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 37m Claude in excel is the best thing AI has brought to my life
- 3h Does effort tier change refusal behavior on agent-attack prompts? CVP run 4 with sonnet 4.6 high and max efforts.
- 7h Self-Hosted AI Red Team Tools
- 20h LLM CTF challenges. Can you crack all 13?
- 1d Most AI agent "skills" on GitHub are unvetted garbage. I built a marketplace to fix that.
95 itemsevent
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 16m Hardening claude-code-action after the April 2026 Comment and Control CVE - actual YAML changes
- 11h Fortune 100 AI Use
- 11h CC-OpenAI-Codex Plugin, but for all CLI agents
- 16h Another Microsoft Copilot AD injected into 4M GitHub commits
- 22h GitHub Copilot: GPT-5.5 7.5x more expensive under promotional pricing than 5.4
How do you guys actually talk to Claude? (www.reddit.com)
I’ve been using Claude for a bit, but I feel like I'm barely using it right. I see people doing all this crazy stuff with it, and I'm basically just using it like a smarter search bar or something For those of you who get great results, wh…
Agents Aren't Coworkers, Embed Them in Your Software (www.feldera.com via hn)
Agentic management software is all the hype today: What started with Moltbot and OpenClaw now has a lot of competition: ZeroClaw, Hermes, AutoGPT etc. These systems work well and allow you to train and build generic agent loops that are ge…
Zephyr Agent: Add AI chat to any website (zephyr-agent.sh via hn)
A hosted proxy for the Zephyr widget. Bring your own API key, pay once, drop in one tag.
Ask HN: Oh, What Places to Go (Seriously Tho) (news.ycombinator.com)
Hey HN — will start by saying this website is my most fav website — ever. That said — will get to the point.
OpenAI shipped privacy-filter, a 1.5B PII tagger you can run locally (redactdesk.app via hn)
OpenAI released a small open-weights model that tags eight categories of personal information before you send text to any cloud LLM. Here is what it does, what it does not, and how RedactDesk uses it.
Software recommendations for AI computer control agent on mac? (www.reddit.com)
Hey all, I've been trying to set up some form of computer control app on mac after loving claude computer use but being pretty let down by usage limits. I've spent literal days fighting with openclaw which has just been a nightmare to inst…
Usage limits for each of the Claude plans (xcancel.com via hn)
Hammered the $100 Codex plan all month with parallel agents and deep coding sessions. This is the first time I've hit below 50%.
- Claude usage (www.reddit.com)
why this exists been using claude for almost all my business planning - pricing, customer interviews, marketing strategy, sales calls. the problem is claude knows these books from training data but only surface level.
-
100 items
event
CoworkIssues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.
169 itemsmodel roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 38m Qwen3.6-27B-FP8 - JS file is too long and causing JSON truncation
- 4h Qwen3.6-35B-A3B KLDs - INTs and NVFPs
- 6h Quant Qwen3.6-27B on 16GB VRAM with 100k context length
- 7h Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb
- 7h RTX 3090 + 27B model performance issues (llama.cpp) what am I doing wrong
What methods are you currently using to operate & manage your AI agents? Is there a suitable project that offers the following features: - Sandbox mechanism with traceable operations and rollback capabilities - Remote control and dashboard…
Show HN: Get – get anything from your computer (github.com via hn)
Hey everyone! I recently wrote a small binary tool that calls an LLM to execute commands, designed to get various information on your computer (for when you're too lazy to remember the commands :-/) e.g.
Show HN: SigmaLifting CLI – helping agents understand strength training (sigmalifting.app via hn)
SigmaLifting You've tried programming in spreadsheets. The formulas break, the columns drift, and sharing with your training partner means emailing a file called program_v4_FINAL_final2.xlsx.
I joined this sub when claude 3 opus dropped and it was a completely different world in here, small group of people who'd stumbled onto something that felt genuinely different from chatgpt and couldn't shut up about it. The posts were stuf…
Claude Design token usage make the tool useless right now (www.reddit.com)
I just gave Claude Design a try. I had it iterate on existing design that were generated from Stitch, so nothing entirely from scratch.
AI agents are quietly replacing software engineers — my weekend test (www.reddit.com)
With CS enrollment dropping and AI layoffs in the news, I tested whether one agent could handle pieces of a junior dev’s job over the weekend. I set up Claude with basic tools and got it to: Read a spec Split it into tasks Code and debug…
I reverse-engineered Claude Desktop's storage to give it memory (github.com via hn)
Mnemos Claude Desktop has no memory API. So I reverse-engineered its storage.
lipstyk — static analysis for machine-generated code patterns I've been neck deep in agentic dev for a while. Started on Pi, ended up building my own toolset on top of it, and at this point the agents output most of the code while I play t…
I'm the owner of a Business workspace shared with 3 friends — we split the cost because $100/month solo is steep. Now I'm wondering: can I invite a second account of my own to the workspace, so I can use 2 on the same device: web app and c…
GPT-Image 2.0 is lowkey blowing my mind (www.reddit.com)
Just spent an hour prompting the new Image 2.0 and the quality jump is ridiculous. Complex scenes, accurate lighting, and consistent details on the first or second try — it actually feels usable now.