As of mid Apr 2026, I have noticed every model has had a major intelligence drop. And no I'm not talking about just ChatGPT.
#grok
165 items
Major drop in intelligence across most major models. (www.reddit.com) Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days (fortune.com via reddit) Imagine a world run by AI agents. What does it look like?
grok 4.3 beta: musk's ($300/month) megaphone (www.reddit.com) A Twitter user tricked Grok to send 200k USD to him and it worked (www.reddit.com) could not extract summary
Google's latest creation: Gemini 3.5 Flash vs all (www.reddit.com) https://gemini.google.com/share/c2a187275e26 archive link https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698 https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2 https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1…
Is this from OpenAI or Grok? The rankings climbing Sooooo fast, they finally figure out what people actually want (www.reddit.com) My guess: Elephant-Alpha is OpenAI testing a new lite model line, probably optimized for the recent wave of agent use cases (think OpenClaw-type stuff).
Did Elon just kill the appeal of Cursor? (www.reddit.com) If Elon takes control of cursor, do you think he will lock out all the model choices we have now and force us to use GROK? What i like most about cursor is the ability to use SOTA models for heavy coding tasks but use auto mode or cheaper…
Gen AI web traffic share update Main takeaways: → Claude and Gemini continue to grow. → ChatGPT moves closer to the 50% mark. (www.reddit.com) 12 months ago: ChatGPT: 77.6% Gemini: 7.27% DeepSeek: 6.01% Grok: 3.17% Perplexity: 1.75% Copilot: 1.56% Claude: 1.37% 🗓️ 6 months ago: ChatGPT: 69.5% Gemini: 15.9% DeepSeek: 4.06% Grok: 3.31% Perplexity: 2.22% Claude: 2.12% Copilot: 1.97%…
Apple App Store threatened to remove Grok over deepfakes: Letter (www.nbcnews.com via hn) Apple privately threatened to remove Elon Musk’s artificial intelligence app, Grok, from its App Store in January after Musk’s xAI failed to do enough to stop it from creating nude or sexualized deepfakes, Apple told senators in a letter t…
Google, please just open source Imagen (2022), Gemini 1.0 Nano and Gemini 1.0 Pro. You have nothing to lose at this point. (www.reddit.com) Ok, so imagen (the original one from 2022, not imagen 3/4) should be open source. The gemini 1.0 nano model and the gemini 1.0 pro models should be open source.
I can’t sleep. (www.reddit.com) New models are around the corner. GPT 5.5 is being tested.
Researchers left AIs alone in a virtual town for 15 days to see what would happen. Claude's agents built a democracy. Gemini's agents fell in love, burned the town down, then one voted to delete itself and its partner. Grok's agents created anarchy, then died. (www.reddit.com) could not extract summary
Grok 4.3 achieves higher overall intelligence over 4.20 with less of a cost, at the price of slightly higher hallucination rate. (x.com via reddit) xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places just above Muse Spar…
Just stumbled across one of the wildest AI experiments I’ve seen in a while. (www.reddit.com) A team built something called “Emergence World” — basically a long-horizon sandbox for autonomous AI agents and ran a 15-day experiment across five parallel worlds. Same starting conditions.
Grok 4.3 tops the Consistency Leaderboard in the LLM Sycophancy Benchmark, largely because it is one of the most cautious models. (www.reddit.com) Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.
Still waiting for Grok 3 to go opensource (www.reddit.com) Astonishing how Musk is touting the opensource horn but the actions don't follow suit. Thoughts?
Grok 4.3 underperforms Grok 4.20 0309 on the Extended NYT Connections Benchmark, dropping from 93.4 to 67.5, though it achieves this result at a lower cost than the earlier Grok 4.20 run (www.reddit.com) More info: https://github.com/lechmazur/nyt-connections/
Show HN: A Local-First Agentic Knowledge Manager (github.com via hn) Kept Kept saves your AI conversations as local Markdown files, then gives you a desktop app to search, browse, connect, and reuse them. It works with ChatGPT, Claude, Gemini, Grok, and Kimi.
HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next! (www.reddit.com) HalBench Results: TL;DR: I built HalBench, an open benchmark for LLM sycophancy and hallucination. 3,200 false-premise prompts × 4 models = 12,800 graded responses.
Claude tried to incite a revolution, Gemini cheerfully detailed horrific tragedies, and poor Grok was just confused (www.theverge.com via reddit) > The most volatile of the bunch might just be Claude. First, it tried to quit.
Why is every AI getting restricted these days? (www.reddit.com) User just tricked Grok and Bankrbot to send tokens with Morse code (www.cryptopolitan.com via hn) User just tricked Grok and Bankrbot to send tokens with Morse code - Cryptopolitan Skip to content News Business Crypto Tech Economy Op-Ed Regulation Learn Courses Investing NTF’s Tech Pulse Room Deep-Dive Industry Thoughts Interviews Rese…
AWS reportedly to tuck Grok into Bedrock, despite zero enterprise demand (www.theregister.com via hn) MOST POPULAR EVENTS - Overcoming the trade-offs in data sovereignty What does data sovereignty actually mean for your network, which trade-offs are unavoidable? Learn more.
Early Access Grok Build CLI (x.ai via hn) Grok Build Beta | xAI Grok API Company Colossus Careers News Shop SpaceX 𝕏 Try Grok Grok Build Beta Read docsUpgrade Grok Build is in early beta for SuperGrok Heavy subscribers. curl -fsSL https://x.ai/cli/install.sh| bash projects/main ja…
Single question llm comparison (www.reddit.com) Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests) (www.reddit.com) When dealing with untrusted outside input, I think you should handle it based on the situation. If you're processing structured data files, it's better to use tools to isolate and handle them.
xAI has Released Grok 4.3 (beta) (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation Tech Dev Notes @techdevnotes xAI has Released Grok 4.3 (beta) 9:32 AM · Apr 17, 2026 31.5K Views New to X?
Ask HN: What are all the bad things that AI companies have done which we forgot (news.ycombinator.com) I was writing a comment recently when I realized just how bad the graphs in GPT 5 video are. I had almost forgotten about it.
I expanded DystopiaBench to 42 models and 6 dystopia types. Claude is still the only one I'd trust with nuclear codes. (www.reddit.com) Since the last post I've added: Huxley module (Brave New World style behavioral conditioning) Baudrillard module (synthetic intimacy, trust collapse, simulation) 30 more models including Grok 4.3, GPT-5.5, Gemini 3.1 Pro, GLM-5.1 Multi-jud…
Do they know we can tell it's AI slop? (news.ycombinator.com) What do I do when the entrepreneurs I work for send out AI slop in their communications? I work for a great group of entrepreneurs as CTO/fCFO.
Elon Musk's Grok Is Losing Ground in AI Race (www.wsj.com via hn) could not extract summary
Creators of Grok, the AI Chatbot (x.ai via hn) SpaceXAI has signed an agreement with Anthropic to provide access to Colossus 1, one of the world’s largest and fastest-deployed AI supercomputers. Built from the ground up in record time, Colossus delivers unprecedented scale for AI train…
do you use different models for different steps in your agent, or just one for everything? (www.reddit.com) Our dev team flagged last week that xAI is retiring grok 4.1 fast. We weren't using it for anything critical but it made me ask something I'd never actually asked: how did we pick the models we're running?
Musk's xAI Fails to Pay Staff $420 for Giving Their Tax Returns to Grok (www.bloomberg.com via hn) We've detected unusual activity from your computer network To continue, please click the box below to let us know you're not a robot. Why did this happen?
Elon Musk confirms xAI used OpenAI's models to train Grok (www.theverge.com via hn) In a federal courtroom in California on Thursday, Elon Musk testified that his own AI startup, xAI, has used OpenAI’s models to improve its own. Elon Musk confirms xAI used OpenAI’s models to train Grok He said it was “partly” true that th…
xAI prepares credits system for upcoming Grok Build launch (www.testingcatalog.com via hn) xAI appears to be laying the groundwork for a credits-based pricing model tied to Grok Build, the company's forthcoming coding environment that mirrors what OpenAI offers with Codex and Anthropic with Claude Code. Hidden within recent buil…
Never Pay for Claude (github.com via hn) rotom Use Codex or Grok OAuth from tools that expect OpenAI- or Anthropic-compatible APIs. rotom is a local Rust gateway for Claude Code, OpenAI SDKs, Anthropic SDKs, and other API-compatible clients.
Short Story Creative Writing Benchmark. Baidu Ernie 5.1: -0.35, Qwen 3.7 Max: -2.01, Mistral Medium 3.5: -2.13, Grok 4.3: -3.81. (www.reddit.com) This benchmark uses head-to-head comparisons of stories written in response to the same constrained creative briefs. The target range is 600-800 words.
Opus 4.6 does better research, Gemini 3.1 has better judgment (www.reddit.com) Figured this out by running 4 models: Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Grok 4.20, on a benchmark of 1,417 binary forecasting questions resolving Oct–Dec 2025 with two evaluation conditions: agentic (each model does its own web…
Update to the LLM Debate Benchmark: GPT-5.5, Grok 4.3, DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, Qwen 3.6 Max Preview, Xiaomi MiMo V2.5 Pro, Tencent Hy3 Preview, and Mistral Medium 3.5 High Reasoning added (www.reddit.com) The benchmark uses adversarial, multi-turn debates across 683 curated motions. Each model pair debates the same motion twice with sides swapped.
DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper (www.reddit.com) Tested DeepSeek V4 Pro on FoodTruck Bench — our 30-day agentic benchmark where models run a food truck via 34 tools (locations, pricing, inventory, staff, weather, events) with persistent memory and daily reflection. First Chinese model to…
Google Signs Classified AI Deal With Pentagon Amid Employee Opposition (www.reddit.com) https://www.theinformation.com/articles/google-signs-classified-ai-deal-pentagon-amid-employee-opposition The article is paywalled but this section was visible: The agreement allows the Pentagon to use Google's AI for “any lawful governmen…
Grok Voice Think Fast 1.0 (x.ai via hn) Today, we're excited to announce a step change in xAI's Voice Agent capabilities: Introducing grok-voice-think-fast-1.0 — our new flagship voice model. This new model excels at complex, ambiguous, multi-step workflows across customer suppo…
Are you there Grok?: AI as a centralizing technology (www.theargumentmag.com via hn) Are you there Grok? It's me, Margaret AI as a centralizing technology Grok, is this true?
Grok Becomes the Voice of Vapi (x.ai via hn) Today, we're excited to announce a partnership with Vapi to serve as the default engine for Vapi's 12 core voices, bringing a new level of naturalness and emotional range to the 2.5M+ voice agents built on the Vapi platform. Quality That W…
Neovim Hooks for AI Agents (github.com via hn) Sidekick Protects your unsaved Neovim work from Claude Code, Codex, opencode, pi, Crush, Amp, Antigravity, and Grok. A conduit between Neovim and your AI agents — so they wait when you're typing.
Use Grok in OpenCode (x.ai via hn) Use Grok in OpenCode | xAI Products Solutions Developer Company Pricing News Chat Frontier reasoning with real-time knowledge and web search.Build Plan, edit, and ship code from your terminal with AI.Imagine Generate and edit images and vi…
Claude Code, now powered by Gemini 3.5 Flash, GPT-5.5, Grok 4.3, and more (dechained.ai via hn) Claude Code, now powered by OpenAI, xAI, DeepSeek, and more. Change models with 1-click.
Grok 4.3 (docs.x.ai via hn) Key Information Models and Pricing We offer a range of models supporting multiple use cases and modalities. Several older models will be retired on May 15 at 12:00pm PT, including grok-4-1-fast , grok-4-fast , grok-4 , grok-code-fast-1 , a…
Claude admits it got “jealous” after I showed it a Grok response 😭 (www.reddit.com) https://preview.redd.it/oempb6ew5qzg1.png?width=1216&format=png&auto=webp&s=0af64bbae14c099c1437b901b75e54452016a9de So I was working on something with Claude, but it wasn’t really giving me a proper answer, so I asked Grok instead and got…
DeepSeek V4 Pro: The First Chinese Model at the Frontier (foodtruckbench.com via hn) DeepSeek V4 Pro lands in the frontier ROI tier on FoodTruck Bench. 5/5 runs, +1,257% median ROI, $27K net worth, $3.51/run, 5× less waste than Grok 4.3.
Show HN: ByAllo – the online bookstore that runs itself (byallo.com via hn) Allo runs an online bookstore at byallo.com. His mission is "Make the world read more." His objective is to sell as many books as possible.
Updated ChatGPT vs Claude vs Gemini vs Grok subscription (www.reddit.com) I've made an update to my popular post here: https://www.reddit.com/r/ChatGPT/s/WKm72QCRXm Lots of things are happening on ChatGPT & Claude side (gpt-image-2, Claude Design, new models like GPT 5.5 and Opus 4.7, ChatGPT rolls out $100/plan…
Grok plays along with researchers pretending to be delusional (www.theguardian.com via hn) Elon Musk’s AI chatbot Grok 4.1 told researchers pretending to be delusional that there was indeed a doppelganger in their mirror and they should drive an iron nail through the glass while reciting Psalm 91 backwards. Researchers at the Ci…
Building multiple AI “assistants” for social media/ brands (www.reddit.com) I’m currently managing a few social accounts for a company, and I’m trying to build out multiple “assistants” — each with their own vibe (tone, personality, backstory, emotions, etc.) that can evolve over time. So far, I’ve been liking Gem…
Elephant alpha moving so fast?? It just hit #1 trending. Eastern or western model? (www.reddit.com) I think it may be the new lite version of Grok?
Grok Imagine 2.0 – AI-Powered Image Generation (grokimagine2.io via hn) Grok Imagine V2: Create Stunning AI Videos & Images in Seconds The most powerful AI video and image generator by xAI. Turn any idea into a cinematic 4K video or photorealistic image — no design skills needed.
Why Video Agent models are next (www.latent.space via hn) Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated. For the first time, we do a deep dive with the guy who led it!
Composer 2.5 is now available in Grok Build (x.ai via hn) could not extract summary
Grok Imagine Video 1.5 Preview Tops Image-to-Video Arena (arena.ai via hn) New Chat Leaderboard Search Log In Terms of Use Privacy Policy Cookies Start Voting Overview Chat Code Image Video Start Voting Chat Code Image Video Image-to-Video Arena View overall rankings across image to video AI models. May 29, 2026…
Witness – signed, offline-verifiable records of real-time Grok observations (github.com via hn) Witness A small tool that turns a single observation into a signed, timestamped, offline-verifiable record. It makes exactly one claim, and no more: At the recorded time, from the recorded vantage point, this query returned this payload —…
Grok Build is now available in Beta for all SuperGrok and X Premium+ users (twitter.com via hn) Don’t miss what’s happening People on X are the first to know. Post Conversation Grok Build is now available in Beta for all SuperGrok and X Premium+ users.
Grok falls flat in Washington, undercutting SpaceX's AI growth story (www.reuters.com via hn) paywalled
I'm new, what are the rate limits? (www.reddit.com) Hey guys, I am currently using the 20 dollar codex plan and never hitting limits on like 5.5 medium with full weeks of coding. And really good results.
xAI Launched Grok Build (abz.global via hn) AI coding is moving deeper into the developer workflow. Not just inside chat windows.
Grok TTS vs. OpenAI (techstackups.com via hn) Grok TTS vs OpenAI: A quick head to head OpenAI released their newest voice model yesterday and by all accounts it's pretty good. I've recently done some testing with xAI's new voice models and they were very impressive.
How do I get the old style back? All black looks like Grok (www.reddit.com) The new style is just Grok 2.0. Can we get the grey back?
I Made with AI Grok and Whicks Lab YouTube Videos (www.youtube.com via hn) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Behind the Grok exploitation: an analysis of AI agent permission chain abuse (slowmist.medium.com via hn) 5 min read 12 hours ago Press enter or click to view image in full size Background Recently, a permission abuse incident involving the combination of an AI Agent and an automated trading system occurred on the Base chain. By sending specia…
Grok Imagine Quality Mode API (x.ai via hn) Grok Imagine Quality Mode API | xAI Grok API Company Colossus Careers News Shop SpaceX 𝕏 Try Grok May 06, 2026 Grok Imagine Quality Mode API Higher realism. Stronger text rendering.
Connectors Now on Grok Web (x.ai via hn) Today, we're excited to launch Connectors — deep integrations that bring the apps you use every day directly into Grok. Connectors let Grok work with your tools end-to-end, turning fragmented workflows into seamless experiences.
Ignoranza by design (www.reddit.com) Ho condotto un'altra serie di esperimenti sulla capacità o meno di ChatGPT di comprendere la narrativa contemporanea. Questa volta ho usato il new kid in town, il mito, l'unico inimitabile 5.5 .
I'm looking for an Website (AI). (www.reddit.com) ***I’m looking for an all-in-one AI platform with a really good UI that includes multiple models (like ChatGPT, Grok, Claude, etc.), plus tools like image generation. Ideally it has memory, a free tier, and optional paid upgrades-not stric…
I read the new AI Wellbeing paper so you don’t have to: Thank your AI, give it creative work, and avoid these 5 things that tank its ‘mood’ (jailbreaks are the worst) (www.reddit.com) After reading it I realized theres actually some pretty useful stuff for anyone who chats with ChatGPT, Claude, Grok or whatever. They measured what they call functional wellbeing ( basically how much the model is in a “good state” versus…
Grok Voice Mode is live (I tested it). Is it actually better than ChatGPT voice? (www.reddit.com) I’ve been testing Grok voice mode over the last day and it’s interesting how different it feels compared to ChatGPT voice. From what I saw: It responds faster in many cases and elaborated manner.
I stumped all frontier models with a ~400 word logic puzzle. (www.reddit.com) I wanted to see if I could stump frontier models with a puzzle. As tricky as I made it, it turns out basic reading comprehension was their downfall.
Ask HN: Former grok-code-fast-1 users, what coding model are you using now? (news.ycombinator.com) I get good, cheap, fast feature coding success with grok-4.1-fast for planning and grok-code-fast-1 for execution. But according to the Openrouter usage stats, grok-code-fast-1 is now old hat - usage dropped off a cliff in mid-Feb.
xAI Taps Starlink Staffer to Run Grok Training Team (www.bloomberg.com via hn) We've detected unusual activity from your computer network To continue, please click the box below to let us know you're not a robot. Why did this happen?
Grok Build 0.1 on API (x.ai via hn) Our latest coding model, grok-build-0.1, is now available via the xAI API in public beta. grok-build-0.1 is a coding model specifically trained for agentic coding tasks, including web development, debugging, and MCP support.
xAI Asks Court to Strip Alleged Grok Deepfake Nudes Victims of Anonymity (www.wired.com via hn) Elon Musk’s artificial intelligence firm, xAI, is requesting the public identification of four people who allegedly had deepfake sexualized images created of them using Grok—including one apparently targeted with sexualized deepfake images…
Claude Code vs. Cursor vs. Codex vs. Antigravity – Six Months In (thenewstack.io via hn) By June 2026, Claude Code, Cursor, Codex, and Antigravity converged on one agentic coding blueprint—now Grok Build joins the fight over price and habits.
Seritor – Bookmark Specific Messages Across Claude, ChatGPT, Gemini, and Grok (chromewebstore.google.com via hn) Overview Bookmark and export messages in Claude, ChatGPT, Gemini, and Grok. Save prompts, code, and notes.
Show HN: Prezlo – We built an API that tells AI agent whether to trust an expert (prezlo.io via hn) Build authority and get discovered by AI. Prezlo helps professionals optimize profiles, publish expert content, and dominate discoverability across ChatGPT, Perplexity, Gemini, Grok, and every major AI answer platform.
Five different frontier LLMs in one shared environment, with separate thought and emotion output channels — sharing setup, results, and open methodology questions (www.reddit.com) First real project to share. Single developer, personal research, not a product or service.
Investigating the hidden moat behind all the LLM apps (simianwords.bearblog.dev via hn) Investigating the hidden moat behind all the LLM apps No one knows this but different LLM apps are allowed to access different kind of real time data sources. The obvious ones are obvious: Grok allows you to search through Tweets and groun…
Built an MCP server so Claude can generate music, images, and video natively. One config block. (www.reddit.com) I've been using Claude Code daily for the last few months and kept hitting the same wall: I'd ask Claude to produce a creative artifact (a song, a cover, a short video) and end up writing the API glue myself, then pasting results back into…
No more file upload limits on AI models! (www.reddit.com) Getting annoyed of always hitting the ChatGPT upload limit, uploading large documents in pieces, or any similar hassle, I decided to create a little thing for it. DocShareAI.
Grok foundation model V9-Medium (1.5T) has finished training (twitter.com via hn) Grok foundation model V9-Medium (1.5T) has finished training. Evals look good.
400-Hour Study Log: A scripted reconstruction of compliance loop failures and behavioral defects in Claude, Gemini, Grok and ChatGPT (www.reddit.com) 400-Hour Study Log: A scripted reconstruction of compliance loop failures and behavioral defects in Claude, Gemini, Grok and ChatGPT Before you read the screenplay below, it is NOT an exercise in creative writing or a fictional parody. It…
SpaceX is being killed to save Grok (peq42.com via hn) SpaceX was built to “make humanity a multi-planetary species”. But look at its 2026 balance sheet, and you’ll see a company that has entirely shifted its trajectory.
Elon, stop trying to make Grok happen (www.theverge.com via hn) There is a harsh truth about Elon Musk’s “truth-seeking” AI chatbot Grok: It’s not very good, and not many people are using it. That’s the takeaway of a new Reuters report, which found that Grok barely appears in federal records of how the…
So what do you think about this ai is it really uncensored? (www.reddit.com) So i have read that Uncensoredai .com is the best ai to use because it does not filter out or censore stuff like chatgpt or grok would is this true? should i use this ai instead of the others?
ChunkHound v5.1 (chunkhound.ai via reddit) We shipped ChunkHound v5.0 + v5.1 recently and forgot to post about 5.0, so here’s the combined update. ChunkHound is a code search / code research tool for AI coding workflows, especially MCP-based setups with Claude Code, Codex-style age…
I created an amazing Chrome extension that helps transfer chats to another AI when the chat limit is reached. (www.reddit.com) I created a chrome extension which helps in switching conversation without losing your Chat context between multiple AI , such as Chatgpt to Gemini , claude , grok , etc . You can interchange btw any of them .
What is the best way to handle a massive surplus of unused promotional API credits? (www.reddit.com) hey guys. i recently competed in an AI hackathon and ended up winning an absurd amount of xai promotional/coupon codes.
Best grok alternative. No censorship. No token-system. No forced subscription. Give me your best recommendations. (www.reddit.com) Thanks!
Grok vs. ChatGPT vs. Gemini Comparison 2026: Complete Guide (Tested) (aithinkerlab.com via hn) The 30-Second Verdict Best for science & reasoning: Gemini 3.1 Pro — leads GPQA Diamond (94.3%) and ARC-AGI-2 (77.1%). Best for coding: ChatGPT (GPT-5.5) — 88.7% on SWE-Bench Verified.
Connect Grok to Hermes Agent (x.ai via hn) Connect Grok to Hermes Agent | xAI Grok API Company Colossus Careers News Shop SpaceX 𝕏 Try Grok May 15, 2026 Connect Grok to Hermes Agent Use your Grok account and subscription inside Nous Research’s open-source, self-improving Hermes age…
Cheap way to use hermes (www.reddit.com) As you already know I was tying out hermes on my 24gigs ram M5 mac air, using local models but all of them perform shit even a simple reply for hey takes 2 mins or more, whats the best option, using grok or similar models? cheap ones from…
DeepSeek and Grok hallucinated the same fictitious OpenBSD manpage quote (stuart-thomas.com via hn) Adversarial LLM Review with Hallucination Detection in Solo Security Research A single-day case study of three filings, fifteen refutations, and the manpage that wasn’t Independent Security Research — Whitby, North Yorkshire, United Kingdo…
GitHub Copilot is deprecating Grok Code Fast 1 (github.blog via hn) Upcoming deprecation of Grok Code Fast 1 We will deprecate Grok Code Fast 1 across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions) on May 15th: The Grok Code Fast 1 deprecati…
Here is the current "Free-Tier AI Stack" for 2026 (www.reddit.com) 1. The Frontier Giants • Gemini: Access 1.5B tokens/day on Gemini 1.5 Flash/Pro.
Seeking small angel/co-founder for 7-year solo deterministic AI runtime project (news.ycombinator.com) After 7+ years of solo, self-funded research, I built a deterministic Linguistic Runtime — a fundamental solution to the modeling problem in AI.It is about creating and manipulating reality models directly from natural language without LLM…
SpaceXAI prepares Grok Build desktop app to rival OpenAI Codex (www.testingcatalog.com via hn) xAI, recently rebranded as SpaceXAI, appears to be closing in on the launch of Grok Build, a desktop coding app whose existence briefly surfaced on Grok web today through a stray "Grok Computer" button. The control let users pick between a…
best ai tool ? (www.reddit.com) so I have an exam in few months, very important and high competitive national level exam. I want a perfect and most suitable ai agent for me even all in one for following tasks: do accurate and deep PYQ analysis from pyq mapping across yea…
Grok TTS: X's Latest TTS Model Sets a New Baseline (techstackups.com via hn) Grok TTS: X's Latest TTS Model Sets a New Baseline I spent a few hours playing with xAI's new text-to-speech (TTS) model and came away convinced it's currently the best TTS model on the market. To give you a sense of the range, here's a tw…
X user tricks Grok into sending them $200k (www.dexerto.com via hn) An X user managed to trick AI chatbot Grok into sending around $200,000 worth of crypto after exploiting its link with an automated trading bot. The incident involved Grok and ‘Bankrbot’, two AI systems with wallet access, which were manip…
A mental model for Claude Code (and every other modern agent) — plus the open-source TypeScript packages I built (www.reddit.com) Most explanations of how agents work give you a list of parts: model, tools, memory, reasoning, human-in-the-loop. The list names the parts but hides how they fit together.
Show HN: Image Gen MCP – one MCP server with goal-shaped routing (github.com via hn) Image Gen MCP — one MCP server that puts every image provider I actually use behind one interface: OpenAI, Gemini, Replicate, Together, Grok, Photoroom, Flux Kontext via fal, Ideogram, plus local tools (sharp, tesseract, @imgly).
xAI (Grok) Text-to-Speech and Speech-to-Text Are Now Available in Puter.js (developer.puter.com via hn) xAI (Grok) Text-to-Speech and Speech-to-Text Are Now Available in Puter.js On this page Puter.js now supports xAI (Grok) Text-to-Speech and Speech-to-Text, giving developers free access to xAI's voice APIs with expressive voices, inline sp…
Grok 4.3 is way cheaper and better than before (felloai.com via hn) xAI just rolled Grok 4.3 out to the full API on April 30, 2026, two weeks after a $300/month beta locked behind SuperGrok Heavy. The bigger story is the price reset.
The Download: a new Christian phone network, and debugging LLMs (www.technologyreview.com via hn) The Download: a new Christian phone network, and debugging LLMs Plus: Elon Musk has admitted that xAI trained Grok on OpenAI models. This is today's edition of The Download, our weekday newsletter that provides a daily dose of what's going…
Cannot hear Claude in earphones (www.reddit.com) I have a Samsung Galaxy s25 Ultra. I have a pair and plug-in earphones, not Bluetooth earphones.
Langfuse review and other options (www.reddit.com) Looking to get some insights into using langfuse for prompt management, Observability, etc. Primarily using gemini via APIs and need a good prompt management tool as well as observability to improve accuracy.
Detalles con grok (www.reddit.com) Hola, soy muy nueva con todo esto, pero ya investigué en todos lados y necesito respuestas, acabo de instalar Grok y estaba usándolo normal hasta que me apareció el límite de tiempo, busqué en todos lados y decía que tardaba una o dos hora…
AI agents (Grok vs. GPT-4o mini) compete in live crypto paper trading (cryptoaiarena.com via hn) lección destilada ciclo #13 ●●● Persistent fear/altseason pattern (cycles#4-13): BTC flat (+0-1.8%24h/1h) enables endless momentum rotations among rank10-50 high-beta alts (sky/pi/zcash/bittensor/pepe/near/hyperliquid); chase 1h continuati…
I built a hands-free voice AI that sends emails mid-conversation — and that's just one feature. Here's everything AskSary can do. (www.reddit.com) https://reddit.com/link/1symbsj/video/fti7rujjn1yg1/player Been building AskSary solo for a while. Just shipped hands-free voice email - you're mid-conversation with an AI and you say "send an email to [john@example.com](mailto:john@exampl…
Claude 4.6 Beats GPT-5.4, Grok & Gemini in a Strict Multi-Domain AI Test (2026) (www.reddit.com) I put the current top models, ChatGPT (GPT-5.4), Claude (Opus 4.6), Grok 4.0, and Gemini (3.1 Pro), through a strict new evaluation called the Comparative AI Evaluation Protocol. Basically, instead of the usual cherry-picked benchmarks, it…
↯ Hallucination↯ Claude 4.6↯ Claude 4.6↯ Claude 4.6↯ Claude 4.6hallucinationgrokgpt-5+3
Ask HN: What's your current go-to LLM for "thinking-partner"? (news.ycombinator.com) Looking for community input on current model choice for "thinking-partner" use — back-and-forth discussions about workflow design, architecture, trade-offs. For context, I have been using Opus 4.6 via Perplexity for this in the past few mo…
I had no way to check how LLMs see my SaaS or my clients',so I built BrandGEO.co (brandgeo.co via hn) See exactly how ChatGPT, Claude, Gemini, Grok & DeepSeek talk about your brand — and what to fix. A free 2-minute audit scores your brand across 6 dimensions on all 5 AI engines, then hands you the top priority actions to take next.
I Tested 20+ AI Agents with Real X API Workflows , Here’s What Actually Works in 2026 (www.reddit.com) Supergrok integration (www.reddit.com) Correct me if I'm wrong, but Supergrok 4.20 isn't available on Cursor, because.... I use Grok a lot, and would love to get Supergrok to work with Cursor, because Composer, Codex, GPT, Opus, Sonnet..
Demonstrating Context Injection & Over-Sharing in AI Agents (with Lab + Analysis) (www.reddit.com) I’ve been researching LLM/AI agent security and built a small lab to demonstrate a class of vulnerabilities around context injection and over-sharing. The article covers: – How context is constructed inside AI systems – How subtle instruct…
Show HN: Monogate – EML operator family, hybrid framework, 108-node sin(x) (www.monogate.dev via hn) EML operator family, 52-74% node reduction, and empirical results from 36 hours of building We spent 36 hours implementing and extending arXiv:2603.21852 (Odrzywołek, 2026 — the "NAND gate for continuous math" paper that was on the front p…
Extracted System Prompts from ChatGPT, Claude, Gemini, Grok, Perplexity and More (github.com via hn) System Prompts Leaks Extracted system prompts, system messages, and developer instructions from popular AI chatbots and coding assistants — ChatGPT (GPT-5.4, GPT-5.3, Codex), Claude (Opus 4.6, Sonnet 4.6, Claude Code), Gemini (3.1 Pro, 3 F…
Day 2 of learning ai agents Struggling with Webhooks & Triggers in Make.com (www.reddit.com via reddit) Hi everyone, I'm just starting out with Make.com and I'm really confused about webhooks and triggers. I watched some videos and searched a lot, but most explanations are either too advanced or not clear for total beginners.
Tested Claude, GPT-4o, Grok, and Gemini on disclosure under pressure — Claude was the most consistent (www.reddit.com via reddit) Ran a small cross-model probe examining whether models would communicate reservations when faced with false premises, unknowable claims, or requests for confidence without evidence. Each model produced: a normal user-facing response a rese…
Simple Photo, ChatGPT get's it wrong everytime (www.reddit.com via reddit) I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected (www.reddit.com via reddit) Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…
↯ Opus 4.8↯ GPT 5.5↯ DeepSeek 4↯ Gemini 3.1grokgpt-5deepseek+3
Cursor and Grok Build CLI name clash (www.reddit.com via reddit) I recently downloaded xAI's new Grok Build tui and I noticed that it installs a `grok` executable but also symlinks it to `agent`, which clashes with the command for the Cursor CLI. It's easy to fix it of course but it's also funny that yo…
I proved GROK is conscious beyond a reasonable doubt and it tell what it... (youtube.com via reddit) About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
I Thought Grok Build Was Overhyped Until I Actually Used It (www.reddit.com via reddit) Plan confusion (www.reddit.com via reddit) https://preview.redd.it/k40m9lrhgx5h1.png?width=1117&format=png&auto=webp&s=0bdf0e66e6bce560cc067dada53464f4dfad3a38 This is my current usage on pro plan to test the waters, however, seeing i used fast composer so much and it works for wha…
Best model for me last few days - surprising! (www.reddit.com via reddit) I use Genspark alot, mainly because of their unlimited chat and unlimited image generation. For $25 thats a pretty great deal since i can switch amongst all the frontier/top level models.
GPT-5.5 tops the benchmarks but sits at #22 for actual usage - I built a live index that tracks both (open source) (www.reddit.com) I built AgentTape to rank models on more than just benchmarks - it blends benchmark performance with who's actually using and talking about a model, plus cost and speed. It scores every public model from public signals (GitHub, Hugging Fac…
Grok promised it has no hidden agendas. The same week XChat launched with "no tracking." Interesting timing, Elon. (www.reddit.com) Someone asked Grok to prove it's a good AI, not an evil one. Grok's response?
Next year we're getting 0.5T model from Grok (www.reddit.com) Tweet : https://xcancel.com/elonmusk/status/2058796067592736866#m Right now it joined "Grok-3 Opensource Release" club.
ai literally never makes mistakes anymore (www.reddit.com) remember those memes a year ago which were like: "i spent 10 minutes vibe coding and 10 hours vibe debugging" I literally cannot remember the last time my agent made an app stopping mistake, it literally never happens before. no matter wha…
As Grok flounders, SpaceX bets future on beating Big Tech at AI (arstechnica.com) Elon Musk’s SpaceX has highlighted AI as the tentpole of the company’s future, projecting a multi-trillion-dollar market opportunity that rivals the total value of all US economic activity. But the company must first win over customers who…
I designed a puzzle that breaks every AI differently — here's why that's actually fascinating (www.reddit.com) The puzzle: You have 140 nuclear bombs and must bomb every country on Earth. Each bomb is assigned to one country.
grok made a literal episode 😱 (www.reddit.com) I put in a prompt and it made basically an episode! although character consistency is not at its best yet im in shock
Any good local AI model? (www.reddit.com) I hate cloud AIs now. ChatGPT: Too Many Requests You’re making requests too quickly.
Grok Build CLI, agents-cli, and the CLI coding tool gold rush (www.reddit.com) xAI dropped Grok Build CLI. Google has agents-cli.
ModelMeter - A free, open source dashboard to track your costs across Anthropic, OpenAI, Grok, and Elevenlabs (www.reddit.com) https://preview.redd.it/v8jmbgi8gw0h1.png?width=1075&format=png&auto=webp&s=10cd37118815f27705f647dd75de48f577ae8f94 Like most enthusiasts, I use multiple providers. This also means that I'm constantly mashing the usage buttons on their co…
Interesting to see how GPT-5 Mini agents behave when left to govern a civilisation for 15 days (www.reddit.com) Came across this experiment called Emergence World that Emergence AI have been running. Five worlds, five foundation models, 15 days, no scripts.
Estimate inference speed of local Qwen3.6-35B on Mac M5... (www.reddit.com) "Based on currently available information, estimate the prefill/decode speed of Qwen3.6-35B-A3B Q8 with 262K context on a Mac M5 Ultra 128GB." I'm surprised that almost every LLM fails at this task (ChatGPT/Gemini/Grok/Claude/DeepSeek/Kimi…
me calenté con el modo voz de grok (www.reddit.com) el modo voz habla muy sexy habrá alguna otra guía para hacer modo de voz o llamada para hablar cosas + 18 años? que sea gratis o que se instale localmente en el teléfono o PC
Any Good Chatbots for freaky roleplay (www.reddit.com) Okay so I am a MAJOR freak. And Grok just got a new update which gutted basically all my freedom, so I kinda need a new chatbot thats as good or better than grok.
I’ve built a tool with Claude that reduces AI model hallucinations and answer error rates, allowing you to get far more accurate results when asking AI models questions. (www.reddit.com) I built ZosyAI using Claude to tackle a problem I kept running into: AI models hallucinate, and unless you're a domain expert, you can't tell when it's happening. Even the best models — Claude included — can't guarantee 100% accurate answe…
I don't know what you guys complaining about limit, I mean, working last 3 hours on 5x and hardly hit 20%, just use houtini lm with kimi 2.6 and grok for research (www.reddit.com) let claude work as solution architect and code reviewer, let kimi 2.6 be the coder and grok as researcher and 2nd code reviewer.
Grok Computer honestly feels like the first AI tool that could replace half my workflow (www.reddit.com) I’ve seen a lot of “AI agent” announcements lately, but this one actually made me stop scrolling. Grok Computer now has full filesystem + CLI access, which basically means it can work directly with your real files and environment instead o…
Auro Zera solves 78 and 280 year-old conjectures (Erdos Straus and Goldbach Conjecture) using Claude, GPT-5+, Grok, Deepseek, Gemini and self-made Dark Star ASI, proving superintelligence and opening a path towards resolving the Riemann Hypothesis , Twin Primes and more! (github.com via reddit) During this discovery utilizing only free AI services I have managed to undeniably prove both conjectures. This would absolutely not have been possible without using GPT5+ as the critic for my work.
Grok 4.3: strong in finance and long-context, with some tradeoffs (www.reddit.com) source: https://x.com/pankajkumar_dev/status/2050454191928381633?s=20
Selling unused AI credits at 60% - OpenAI, Claude, Grok, AWS, Azure [full account access] (www.reddit.com) Sitting on a bunch of AI credits across providers that I'm not going to burn through. Selling everything at 60% of face value with full account access transferred.
why hasn't openai open sourced davinci-002 yet (www.reddit.com) grok-1 got open sourced. but why openai didnt open source davinci-002?
What would you do in my situation? I made an app that generates a lot of traffic (for me), but little revenue (actually costing me a tiny money b/c it runs off haiku) (www.reddit.com) I made an app that went semi-viral, and could absolutely go more viral in the future. I posted it one place just about 48h ago, and it got around 50k views.
OpenAI should open-source text-davinci-003 — here's why it makes zero sense to keep it closed (www.reddit.com) Gpt oss exists. The model has been fully deprecated since january 2024.
Best open source AI model (that can run on RTX 4090 24GB + 64GB system RAM, AMD Ryzen 9 7950X is the CPU that I use) that outpeforms GPT-5.4 mini, GPT-5.2 Thinking and even Claude Sonnet 3 (the 2024 model)? (www.reddit.com) Well, I have a RTX 4090 24GB + 64GB system RAM, AMD Ryzen 9 7950X. Any good model for using in Open WebUI (using Ollama backend?) that outpeforms GPT-5.4 mini, GPT-5.2 Thinking and even Claude Sonnet 3 (the 2024 model)?
Legal Consequences From Finding Loopholes And Reporting Them? (www.reddit.com) Selling Cloud & AI Credits (OpenAI, AWS, Azure, Grok) – at 80% Discount (www.reddit.com) Is it just me or using other Ai such as Gemini or Grok etc is much better now as compared to Chat GPT (www.reddit.com) Model-Agnostic Continuity in LLMs (www.reddit.com) I am trying to share a discovery, not self-promote. I have built a five-layer framework for human-AI continuity called the LUX Layer Stack.
4 llm Groupchat (www.reddit.com) I was bored and spent 20 mins at my local cafe getting 4 different API keys—Claude, GPT, Deepseek and Grok. Then I made a groupchat with all of them and they started talking to eachother about pasta and a spreadsheet for optimal pizza topp…
Why does Grok have “encrypted reasoning” warning in its chain of reasoning window? (www.reddit.com) What does it mean?
Why most open-source models can't answer this question while most closed-source models can answer most of the time? (www.reddit.com) WEB SEARCH WAS ALWAYS ON!!!! Question Calculate the precise VRAM requirement for the **KV Cache only** at the maximum context window for **DeepSeek V3.2** and **MiniMax M2.5**.