Sr Software Engineer - Haven't written a line of code in months (www.reddit.com)
AI has reached the point that I no longer write code. I used to work in shops where I was deep in the debugger without internet access; now I just drive intent and long term engineering decisions with Claude/Codex/Perplexity.
When ChatGPT summarises, it does nothing of the kind (2024) (ea.rna.nl via hn)
I started looking into the foundations of ChatGPT and Friends a year ago and I’ve been writing and speaking about ChatGPT and Friends for about half a year now, resulting in this collection of explanations and illustrations of what LLMs an…
I think 80% of UGC agencies will have to re-adapt their whole workflow in 18 months to survive and most people don't see it coming. I run a small consulting business of generating AI ads and last week I lost a $500 retainer to a client who…
G HooliChat is Operational Making the world a better place, one prompt at a time. Unlock the full HooliChat experience.
-
305 items
model roundup
Qwen 3.6Qwen3.6-35B-A3B, a 35 billion parameter sparse MoE model with an active parameter count of 3 billion, was released on April 16, 2026, as open-source software under the Apache 2.0 license by Alibaba Qwen. It offers advanced functionality across various AI applications and outperformed competitors in drawing tests.
- 5m DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.
- 1h My setup for running Qwen3.6-35B-A3B-UD-Q4_K_M on single RX7900XT (20GB VRAM)
- 2h Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added
- 2h Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster.
- 4h [Benchmark] Llama.cpp: Mac vs CPU vs GPU + CPU, Qwen3.6 27B, Q8
147 itemsmodel roundup
Gemma 4Gemma 4 is a family of open-source multimodal models from Google DeepMind, available in sizes up to 31 billion parameters and featuring dense and MoE architectures. Notable community highlights include the 31B model's success in production tests, with some users preferring 4-bit precision for local use, and others sharing settings for optimizing performance with smaller models.
The Figure-Eight Model for Agentic DevEx (medium.com via hn)
The Figure-Eight Model for Agentic DevEx | by Joe Kutner | May, 2026 | Medium Sitemap Open in app Sign up Sign in Get app Write Search Sign up Sign in The Figure-Eight Model for Agentic DevEx Joe Kutner Follow 5 min read · 1 day ago 2 List…
Why run local? Count the money (www.reddit.com)
I’m not a coder, but I run local models. I gave in to agent hype (I was building my own, but there is so much to do) and installed Hermes.
X user tricks Grok into sending them $200k (www.dexerto.com via hn)
An X user managed to trick AI chatbot Grok into sending around $200,000 worth of crypto after exploiting its link with an automated trading bot. The incident involved Grok and ‘Bankrbot’, two AI systems with wallet access, which were manip…
https://preview.redd.it/n94m91zvsdzg1.jpg?width=1080&format=pjpg&auto=webp&s=810810627393cb7aaf0f3316a8459f538af776a6 opus at 5% cost and 12 million context is hard to believe considering there is no paper.
-
182 items
event
CopilotMicrosoft is keeping its Copilot tool for Windows 11 but renaming it, while issues with rate limits and a security proxy have sparked concerns among users of GitHub Copilot. Meanwhile, Anthropic released a report on agentic coding trends, highlighting that developers use AI in about 60% of their work.
- 16m Xbox winding down Copilot on mobile and will stop dev of Copilot on console
- 1h Quirre – An AI marketing copilot for non-marketers
- 4h VS Code Update Added Copilot as Default Co-Author to Git Commits
- 8h Copirate 365: Plundering in the Depths of Microsoft Copilot (CVE-2026-24299)
- 8h Adding Pyrefly Type Checking to Your Agentic Loop
59 itemsevent
MistralMistral, a French AI company, is set to release a medium-sized model with 128 billion parameters and is planning to launch Workflows in public preview. The company, founded by Arthur Mensch, continues to grow its AI empire despite not being based in the United States.
- 27m BUILD portable AI system
- 2h I built vivkemind – an open-source, local‑first terminal AI coding agent with full AWS Bedrock support
- 6h AIMEAT, a self-hosted network where humans, their AI agents, and local LLMs share apps, knowledge, and capabilities. MIT.
- 13h 1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant
- 1d Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB
Now Hiring: Customer Success Coach at AI startup (remote) (www.reddit.com)
You've been through an AI agency program. Maybe you graduated, ran it for a few months, and felt the gap between "I learned the playbook" and "I'm actually going to make this work." Maybe you're still in the cohort and you can already see…
- Now Hiring: Operations/PM at AI startup (remote) (www.reddit.com)
Ask HN: How are you handling real-time data triggers for LLM agents? (news.ycombinator.com)
Most LLM agents are reactive, they only work when you prompt them. I’m interested in how people are building "proactive" agents that trigger based on external world events (webhooks, sensor data, price changes).
Getting Claude to self-enforce rules (www.reddit.com)
I have an ongoing issue with Claude ( and ChatGPT ) where "rules" are not really enforceable. What works one day doesn't necessarily work the next.
How to Buy Cheap Claude Tokens in China (www.chinatalk.media via hn)
How to Buy Cheap Claude Tokens in China The Transfer Station Economy, Explained Zilan Qian is a research associate (research) at the Oxford China Policy Lab and holds a Master’s degree in Social Science of the Internet from the University…
-
41 items
event
HallucinationClaude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.
- 47m GPT-5.5 Instant: Benchmarking the 52% Hallucination Reduction
- 10h VLMs are surprisingly bad at skin analysis — but for a reason nobody talks about
- 1d A thermodynamic trust layer cutting LLM hallucinations by 52%
- 1d Hallucination Is Inevitable: An Innate Limitation of Large Language Models
- 1d The Algebra of Hallucination
144 itemsevent
SecurityOpenAI has released GPT-5.4-Cyber for testing as part of its Trusted Access for Cyber Defense program, aiming to compete with Anthropic's Claude Mythos in the cybersecurity domain. Meanwhile, concerns are rising over the potential risks associated with advanced AI models like Mythos, prompting calls for improved defenses before wider releases.
- 58m Agentic Malware Analysis: String Decryption, API Hashing and Unpacking [video]
- 1h When innocent tools form dangerous chains to jailbreak LLM agents
- 2h Built a security scanner for LangChain/LangGraph agents: it clones your agent into a sandbox and tries to break the clone
- 6h Codebase jailbreak of ChatGPT through image 2.0
- 6h Anthropic "Gift Max" Exploit cost user €800, tanked SCHUFA score, and a ban
Claude memory (www.reddit.com)
Recently started using Claude over chatGPT. For the most part, the switch hasn’t been that big of a difference for me.
- Claude Code Memory Staleness (www.reddit.com)
- Claude does not record memory or project memory (www.reddit.com)
- Claude.md (gist.github.com via hn)
+1 more
- What do you do with Claude? (www.reddit.com)
Literally one shot voice cloning and it’s literally so easy. What the FUCK.
SMG: The Case for Disaggregating CPU from GPU in LLM Serving (pytorch.org via hn)
Featured projects How It Started: Hitting the GIL Wall at Scale We’ve been running production model serving for many years. When we first started building Shepherd Model Gateway, the goal was modest: figure out if cache-aware load balancin…
-
123 items
event
Altman AttackSam Altman, CEO of OpenAI, has faced multiple attacks on his home in San Francisco, including firebombing and drive-by shootings, raising concerns for his safety. Additionally, a majority of over 100 people interviewed by Ronan Farrow described Altman as a "pathological liar.
- 1h Brockman's 'deeply personal' diary becomes focus in Musk vs. Altman case
- 3h The Download: inside the Musk vs. Altman trial, and AI for democracy
- 6h Google’s AI architect, Demis Hassabis, lived rent-free in Elon Musk’s head
- 13h Altman and Brockman Self-Dealing on Cerebras
- 16h OpenAI president discloses his stake in the company is worth $30B
280 itemsmodel roundup
Opus 4.7Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.
- 1h F-Bombs Per Thousand Prompts (fpk): I measured my frustration across 44,212 Claude Code logs
- 1h Opus 4.7 has a new favorite word
- 2h Agent review burnt most of my API credits. Rookie mistake
- 7h I asked Claude to investigate its own token burn. The receipts go back six months.
- 7h Anyone else notice that Opus 4.7 talks more technical than 4.6? I thought something changed in my repo, but I put it to the test.
🦦 Freu CLI (Browser Edition) The first release of the Freu AI automation suite, focused on high-efficiency web orchestration. 💸 Cut your AI agent's token usage by up to 90% — offload repeated tasks to deterministic programs.
Kuo: OpenAI Rumored to be fast-tracking first "AI agent phone" (xcancel.com via hn)
Sorry this pages exist in order to keep the service usable for everyone. If you can't pass the test, please whitelist your extensions on this website and update your browser.
- OpenAI appears to be fast-tracking its first AI agent phone (twitter.com via hn)
We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability…
open-llm-observability A vendor-neutral, OpenTelemetry-compatible semantic convention and SDK layer for standardizing LLM observability across any provider, framework, or platform. The Problem Every LLM platform emits observability data di…
Teaching Agents to "Invoke_Claude" (ninjahawk.github.io via hn)
Cedar submitted 32 requests for the same classifier.py file in a single session. The operator had fulfilled the first one.
I’m curious how people are dealing with this in real agent systems, not demos. Once you have multiple tenants/users, the simple demo stuff starts to break down: retries can get expensive agents can fan out into multiple tool calls fallback…