model roundup

GPT 5.4

25 items · started 2026-04-13 · closed 2026-05-14

Follow-up to my TranslateGemma-12b benchmark post: human reviewers flagged 71% of the segments automated metrics rated clean (www.reddit.com)

+2 6w gpt-5 deepseek sonnet+1

A couple of weeks ago I shared the results of a benchmark here showing TranslateGemma-12b beating frontier general models (Claude Sonnet, GPT-5.4, DeepSeek, Gemini Flash Lite) on subtitle translation across 6 languages. The result was stro…
Still lots of goblins (www.reddit.com)

+21 6w gpt-5 copilot

"GPT-5.4 Medium" in github-copilot: I’m ready to edit the code, but first I’m reading the two user-facing docs that mention configuration so I can keep behavior and documentation in sync rather than creating a tiny chaos goblin.
Subagents using older models? (www.reddit.com)

+11 7w gpt-5 codex cursor

I started using the subagent-driven skill recently and noticed Cursor often spawns GPT-5.1/5.2 sub agents (or Composer 2 which is fine) for coding tasks. What I don’t understand is why is it using these older models when GPT-5.3 Codex cost…
GPT 5.4 showing as 5.5? (www.reddit.com)

+21 7w

When i hover over 5.4 it shows 5.5 description and 5.5 shows 5.5 (that's fine obviously) https://preview.redd.it/36vn3xadp4zg1.png?width=450&format=png&auto=webp&s=be029c619c306821133f3951620a8cd199f7da3e https://preview.redd.it/neu1plm9p4…
UPDATE: The method from the proof generated by GPT-5.4 Pro for Erdos Problem #1196 was successfully applied to other problems including another 60 year old Erdos conjecture. (www.reddit.com)

+225 7w gpt-5

Link to tweet: https://x.com/jdlichtman/status/2050460077904285789 Links for the talks: https://m.youtube.com/@FoMathematics?ra=m https://events.stanford.edu/event/future-of-mathematics-symposium Link to original post about problem #1196:…
A GPT-5.4 bug led to OpenAI banning goblins and raccoons (news.ycombinator.com)

+5 8w gpt-5 codex openai

Someone found this in OpenAI Codex’s system prompt: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query." Goblins, grem…
Running an autonomous agent across Claude Code + Codex + a local 35B almost killed my host. The harnesses were heavier than the model. (www.reddit.com)

+12 8w gpt-5 sonnet codex+2

I run an autonomous agent on a 16GB Mac Mini. Two cloud harnesses (Claude Code with Opus/Sonnet, Codex CLI on GPT-5.4/5.5) plus a local-LLM tier for triage and fallback.
Chat GPT 5.4 solved a 60+ years unsolved erdos problems in a single shot (www.reddit.com)

+251 8w chatgpt

For years, the AI/ LLM critics had the same reasoning: LLMs don't reason and they just predict the next token Recently, it reasoned better than 50 years of mathematicians on an open erdos problems by applying a basic phd level formula Chat…
Second opinion: huge quality booster (www.reddit.com)

+33 8w gpt-5 claude-code

I've noticed for a while now that LLMs (I've seen this behavior in many of them) tend to perform surprisingly well when exposed to a second opinion from another LLM — definitely better than without! So I looked for a base second opinion pr…
Trained Qwen to Write Clojure Better Than GPT-5.4 (Kinda) (www.nibzard.com via hn)

+1 9w gpt-5 qwen

Trained Qwen to Write Clojure Better Than GPT-5.4 (Kinda) TL;DR >> Fine-tuned Qwen3 on Clojure. 30B SFT hits 83.8% best-of-16, smashing GPT-5.4's 64%.
Cursor switches model params by itself (www.reddit.com)

+32 9w cursor
Can Claude in Cursor launch a GPT-5.4 reviewer subagent? (www.reddit.com)

+14 9w gpt-5 cursor
Kimi K2.6 vs. GPT-5.4 (xhigh) - When will the new OpenAI model be released? This Thursday? (www.reddit.com)

+7415 9w gpt-5 openai
Yet another example of an epic fail at a kindergarten-level task. ... :D (www.reddit.com)

9w gpt-5 openai
Ever since the new $100 Pro plan, they now claim there's a "dynamic usage limits" that can become restricted at anytime, and not reset for indefinitely as long as they deem it "appropriate" (www.reddit.com)

+676 9w gpt-5
The quality of GPT-5.4 is infuriatingly POOR (www.reddit.com)

2 9w glm gpt-5 codex

I got a Codex membership when GPT-5.4 launched and was getting by well enough for a while. Then I started using Claude and GLM 5.1, and my production quality improved significantly.
OpenAI's GPT-5.4 Pro reportedly solves an open Erdős problem in two hours (the-decoder.com via hn)

+2 10w gpt-5 openai

OpenAI's GPT-5.4 Pro reportedly solves a longstanding open Erdős math problem in under two hours OpenAI's GPT-5.4 Pro model has apparently solved Erdős open math problem #1196. The model reportedly found the solution in about 80 minutes an…
I tried adding rich UI elements to Open WebUI (www.reddit.com)

+136 10w gpt-5

so i tried adding openui to openwebui and it worked pretty well. used it with gpt-5.4-mini and it was super fast and responsive.
GPT-5.4 pro solves erdos problem #1196 (www.erdosproblems.com via hn)

+3 10w gpt-5

We have built what one might call the von Mangoldt downward process $n \mapsto n/q$ (with transition probability $\Lambda(q)/\log n$), the von Mangoldt measure $\nu$, and the von Mangoldt upward process $n \mapsto qn$ (with transition prob…
🔥BREAKING: OpenAI rolls out GPT-5.4-Cyber to limited group for testing, seeks to rival Claude Mythos (www.reddit.com)

+9145 10w gpt-5 security mythos+2

OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…
In the Wake of Anthropic's Mythos, OpenAI Has a New Cybersecurity Model—and Strategy (www.wired.com via reddit)

2 10w gpt-5 mythos openai+1

OpenAI on Tuesday announced the next phase of its cybersecurity strategy and a new model specifically designed for use by digital defenders, GPT-5.4-Cyber. The news comes in the wake of an announcement last week by competitor Anthropic tha…
Your intuition of LLM token usage might be wrong (blog.andreani.in via hn)

+2 10w gpt-5

Your intuition of LLM token usage might be wrong I just finished a task with GPT-5.4-mini. Here’s the session summary from oh-my-pi (an agent harness): Tokens Input: 3_648_340 Output: 61_676 It was a hefty 30 min session.
My experience with testing all frontier open-weight models against GPT and Claude (www.reddit.com)

+1320 10w ollama gemini chatgpt

I spent about a week testing open-weight models for real work, comparing them against what I already know from ChatGPT, Gemini, and Claude. The gap between what benchmarks suggest and what happens when you give these models something to ve…
Is Cursor Dashboard Real-time? (www.reddit.com)

+72 10w gpt-5 cursor

Does the Spending tab on the dashboard not update in real-time? It says I have 0% API usage, but today I only used gpt-5.4-medium, which I believe should count toward it.
Did the $100 Plan Affect the GPT-5.4 Pro Model? (www.reddit.com)

+1014 10w gpt-5 codex chatgpt

Most people are focused on the changes in the usage limits of Codex with the new Pro and Plus plans, but has anyone experienced changes to the Pro model on ChatGPT using the $200 vs $100 plan? I used to use the $200 Pro plan and used the P…

← all threads