1. If I were to sell the power of LLMs as powerful research agents, and if I had enough money, I could think about introducing little "gems" into the training set of LLM so that they are able to discover new theorems and proofs. There is a lo…

  2. Anthropic research: https://www.anthropic.com/research/natural-language-autoencoders

  3. Generate AI images with GPT Image 2 for free within a fair daily cap. No ChatGPT account, no login, no credit card, no watermark on outputs.

  4. Curious to hear everyone’s thoughts on this. As autonomous AI agents get better at handling complex tasks with minimal human input, which industries do you think will see the biggest disruption first and why?

  5. I've been learning German recently, and it occurred to me that I could point some of my AI horsepower at having a German speaking LLM to practice with. I'm not too concerned with the speech to text side of things or getting it to talk back…

  6. Looking for ideas on how I can optimize my workflow further. I currently have created a moderately complex vibe coded app.

  7. event

    Cowork
    205 items

    Issues with Claude Cowork have been reported, including errors and disruptions for some users on April 16, 2026. Additionally, Google has developed its own desktop Agent to compete with Cowork, while users continue to explore alternatives and troubleshoot bugs in the platform.

    model roundup

    Opus 4.6
    91 items

    Opus 4.6, a version of Anthropic's AI model Claude, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, and is being retired from Copilot Pro+. Notably, Claude Code demonstrated advanced capabilities by generating a detailed 12-week training plan in one call.

  8. Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation patoroco @patoroco Stop wasting money on Claude or ChatGPT subscriptions for coding.

  9. Has anyone here worked at Cursor or knows someone who does? A recruiter reached out recently, and I’m trying to get a genuine feel for what the company is like beyond the usual recruiter pitch.

  10. Running a quantized 72B VLM on M4 Pro for GUI tasks — some numbers Been messing around with running a vision-language model locally on my Mac to do GUI automation stuff — basically the model looks at a screenshot of my desktop and decides…

  11. Lately I’ve been noticing how quickly prompts grow in real AI apps. Teams keep adding: more examples formatting instructions fallback behavior style constraints edge-case handling …but almost nothing gets removed over time.

  12. I keep seeing the same trajectory in AI startup conversations: AI search → coding agents → OpenClaw → agent IM → ? Most people fill in that question mark with some version of "agent collaboration platform." AI-native Slack.

  13. Agentic Gamedev Skills English | 日本語 This repository collects agent skills extracted from game-development work and related agentic-workflow research. Each skill lives under .agents/skills/, uses SKILL.md as its entry point, and may includ…

  14. model roundup

    Opus 4.7
    323 items

    Claude Opus 4.7, released on April 16, 2026, is Anthropic's latest advanced AI model, offering improved handling of complex tasks and a larger context window of up to 1 million tokens. This version is 50% more expensive than its predecessor due to enhanced capabilities in software engineering and hybrid reasoning.

    50 items

    Claude Opus 4.6, Anthropic's flagship model, saw its accuracy drop on the BridgeBench hallucination test from 83% to 68%, highlighting a significant regression in handling certain tasks. Meanwhile, biologists are revisiting cases of mushroom-induced hallucinations in China, suggesting ongoing research into natural causes of similar phenomena.

  15. This is an automatic post triggered within 2 minutes of an official Claude system status update. Incident: Elevated errors on Claude Opus 4.1 Check on progress and whether or not the incident has been resolved yet here : https://status.cla…

  16. (Disclaimer: Originally posted on r/AIEval thought this is relevant) Been iterating on a setup where my coding agent (cursor in my case) runs evals in a loop, reads the failing metrics, and patches things automatically. Wanted to share the…

  17. I used claude via the web console for a while, then thought I'd want to go for higher limits, and API usage. I click and end up here https://claude.ai/upgrade.

  18. Hello! I've recently started working with AI more and more due to my company exceeding requirements and timings for development.

  19. I’ve been experimenting with Cursor agents for more than just one-off coding tasks, and I kept running into the same problem: once you have multiple agents running across different workflows, the terminal starts to feel messy fast. So we b…

  20. paywalled

  21. model roundup

    Sonnet 4.5
    5 items

    On May 4, 2026, multiple automated status updates reported elevated errors for Claude Opus 4.5 and Sonnet 4.5 around the same time, with Anthropic introducing a feature called E-STEER that applies emotion intervention to these models.

  22. could not extract summary

  23. Vibe-coders, I’m done with Electron-based sidebars. I’m done with "Apply" buttons.

  24. I’ve been trying different Claude setups for a while, and honestly, most of them don’t hold up once you start using them in real work. At first, everything looks fine.

  25. I’m actually serious about this lol Not AGI or sci-fi stuff, I mean realistically with current models like Claude I use Claude Max pretty heavily already and honestly it feels way closer than most people think. A huge part of my work is ba…

  26. This question will probably make more sense when I explain my current situation: lately I’ve been doing some small projects here and there to some small business in my town and they have been working fine, but that is about to change. I ma…

  27. Did anyone use the caveman prompt (or skill) in the web app version of claude, if yes how did you achieve that and also could you tell me did it really help with saving tokens or not ?

  28. Hi Reddit, we are a team of database researchers (including a PhD from MIT DB Group) and we just open-sourced an embedded vector database for agent/LLM applications. An embedded vector database supporting both text and vectors.