model roundup
GPT 5.4
-
A couple of weeks ago I shared the results of a benchmark here showing TranslateGemma-12b beating frontier general models (Claude Sonnet, GPT-5.4, DeepSeek, Gemini Flash Lite) on subtitle translation across 6 languages. The result was stro…
-
Still lots of goblins (www.reddit.com)
"GPT-5.4 Medium" in github-copilot: I’m ready to edit the code, but first I’m reading the two user-facing docs that mention configuration so I can keep behavior and documentation in sync rather than creating a tiny chaos goblin.
-
Subagents using older models? (www.reddit.com)
I started using the subagent-driven skill recently and noticed Cursor often spawns GPT-5.1/5.2 sub agents (or Composer 2 which is fine) for coding tasks. What I don’t understand is why is it using these older models when GPT-5.3 Codex cost…
-
GPT 5.4 showing as 5.5? (www.reddit.com)
When i hover over 5.4 it shows 5.5 description and 5.5 shows 5.5 (that's fine obviously) https://preview.redd.it/36vn3xadp4zg1.png?width=450&format=png&auto=webp&s=be029c619c306821133f3951620a8cd199f7da3e https://preview.redd.it/neu1plm9p4…
-
Link to tweet: https://x.com/jdlichtman/status/2050460077904285789 Links for the talks: https://m.youtube.com/@FoMathematics?ra=m https://events.stanford.edu/event/future-of-mathematics-symposium Link to original post about problem #1196:…
-
A GPT-5.4 bug led to OpenAI banning goblins and raccoons (news.ycombinator.com)
Someone found this in OpenAI Codex’s system prompt: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query." Goblins, grem…
-
I run an autonomous agent on a 16GB Mac Mini. Two cloud harnesses (Claude Code with Opus/Sonnet, Codex CLI on GPT-5.4/5.5) plus a local-LLM tier for triage and fallback.
-
For years, the AI/ LLM critics had the same reasoning: LLMs don't reason and they just predict the next token Recently, it reasoned better than 50 years of mathematicians on an open erdos problems by applying a basic phd level formula Chat…
-
Second opinion: huge quality booster (www.reddit.com)
I've noticed for a while now that LLMs (I've seen this behavior in many of them) tend to perform surprisingly well when exposed to a second opinion from another LLM — definitely better than without! So I looked for a base second opinion pr…
-
Trained Qwen to Write Clojure Better Than GPT-5.4 (Kinda) (www.nibzard.com via hn)
Trained Qwen to Write Clojure Better Than GPT-5.4 (Kinda) TL;DR >> Fine-tuned Qwen3 on Clojure. 30B SFT hits 83.8% best-of-16, smashing GPT-5.4's 64%.
-
Cursor switches model params by itself (www.reddit.com)
-
Can Claude in Cursor launch a GPT-5.4 reviewer subagent? (www.reddit.com)
-
-
-
-
Okay, dust has settled now, hows your experience with composer 2? (www.reddit.com)
-
The quality of GPT-5.4 is infuriatingly POOR (www.reddit.com)
I got a Codex membership when GPT-5.4 launched and was getting by well enough for a while. Then I started using Claude and GLM 5.1, and my production quality improved significantly.
-
OpenAI's GPT-5.4 Pro reportedly solves an open Erdős problem in two hours (the-decoder.com via hn)
OpenAI's GPT-5.4 Pro reportedly solves a longstanding open Erdős math problem in under two hours OpenAI's GPT-5.4 Pro model has apparently solved Erdős open math problem #1196. The model reportedly found the solution in about 80 minutes an…
-
I tried adding rich UI elements to Open WebUI (www.reddit.com)
so i tried adding openui to openwebui and it worked pretty well. used it with gpt-5.4-mini and it was super fast and responsive.
-
OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…
-
OpenAI on Tuesday announced the next phase of its cybersecurity strategy and a new model specifically designed for use by digital defenders, GPT-5.4-Cyber. The news comes in the wake of an announcement last week by competitor Anthropic tha…
-
GPT-5.4 Pro solves Erdős Problem #1196 (www.reddit.com)
could not extract summary
-
Your intuition of LLM token usage might be wrong (blog.andreani.in via hn)
Your intuition of LLM token usage might be wrong I just finished a task with GPT-5.4-mini. Here’s the session summary from oh-my-pi (an agent harness): Tokens Input: 3_648_340 Output: 61_676 It was a hefty 30 min session.
-
I spent about a week testing open-weight models for real work, comparing them against what I already know from ChatGPT, Gemini, and Claude. The gap between what benchmarks suggest and what happens when you give these models something to ve…
-
Is Cursor Dashboard Real-time? (www.reddit.com)
Does the Spending tab on the dashboard not update in real-time? It says I have 0% API usage, but today I only used gpt-5.4-medium, which I believe should count toward it.
-
Did the $100 Plan Affect the GPT-5.4 Pro Model? (www.reddit.com)
Most people are focused on the changes in the usage limits of Codex with the new Pro and Plus plans, but has anyone experienced changes to the Pro model on ChatGPT using the $200 vs $100 plan? I used to use the $200 Pro plan and used the P…