model roundup

GPT 5.4

26 items · started 2026-04-13 · closed 2026-05-14

  1. A couple of weeks ago I shared the results of a benchmark here showing TranslateGemma-12b beating frontier general models (Claude Sonnet, GPT-5.4, DeepSeek, Gemini Flash Lite) on subtitle translation across 6 languages. The result was stro…

  2. "GPT-5.4 Medium" in github-copilot: I’m ready to edit the code, but first I’m reading the two user-facing docs that mention configuration so I can keep behavior and documentation in sync rather than creating a tiny chaos goblin.

  3. I started using the subagent-driven skill recently and noticed Cursor often spawns GPT-5.1/5.2 sub agents (or Composer 2 which is fine) for coding tasks. What I don’t understand is why is it using these older models when GPT-5.3 Codex cost…

  4. When i hover over 5.4 it shows 5.5 description and 5.5 shows 5.5 (that's fine obviously) https://preview.redd.it/36vn3xadp4zg1.png?width=450&format=png&auto=webp&s=be029c619c306821133f3951620a8cd199f7da3e https://preview.redd.it/neu1plm9p4…

  5. Link to tweet: https://x.com/jdlichtman/status/2050460077904285789 Links for the talks: https://m.youtube.com/@FoMathematics?ra=m https://events.stanford.edu/event/future-of-mathematics-symposium Link to original post about problem #1196:…

  6. Someone found this in OpenAI Codex’s system prompt: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query." Goblins, grem…

  7. I run an autonomous agent on a 16GB Mac Mini. Two cloud harnesses (Claude Code with Opus/Sonnet, Codex CLI on GPT-5.4/5.5) plus a local-LLM tier for triage and fallback.

  8. For years, the AI/ LLM critics had the same reasoning: LLMs don't reason and they just predict the next token Recently, it reasoned better than 50 years of mathematicians on an open erdos problems by applying a basic phd level formula Chat…

  9. I've noticed for a while now that LLMs (I've seen this behavior in many of them) tend to perform surprisingly well when exposed to a second opinion from another LLM — definitely better than without! So I looked for a base second opinion pr…

  10. Trained Qwen to Write Clojure Better Than GPT-5.4 (Kinda) TL;DR >> Fine-tuned Qwen3 on Clojure. 30B SFT hits 83.8% best-of-16, smashing GPT-5.4's 64%.

  11. I got a Codex membership when GPT-5.4 launched and was getting by well enough for a while. Then I started using Claude and GLM 5.1, and my production quality improved significantly.

  12. OpenAI's GPT-5.4 Pro reportedly solves a longstanding open Erdős math problem in under two hours OpenAI's GPT-5.4 Pro model has apparently solved Erdős open math problem #1196. The model reportedly found the solution in about 80 minutes an…

  13. so i tried adding openui to openwebui and it worked pretty well. used it with gpt-5.4-mini and it was super fast and responsive.

  14. OpenAI has officially announced GPT-5.4-Cyber today as part of an expanded Trusted Access for Cyber Defense program. OpenAI describes it as a version of GPT-5.4 that is tuned for legitimate cybersecurity work, with a lower refusal boundary…

  15. OpenAI on Tuesday announced the next phase of its cybersecurity strategy and a new model specifically designed for use by digital defenders, GPT-5.4-Cyber. The news comes in the wake of an announcement last week by competitor Anthropic tha…

  16. could not extract summary

  17. Your intuition of LLM token usage might be wrong I just finished a task with GPT-5.4-mini. Here’s the session summary from oh-my-pi (an agent harness): Tokens Input: 3_648_340 Output: 61_676 It was a hefty 30 min session.

  18. I spent about a week testing open-weight models for real work, comparing them against what I already know from ChatGPT, Gemini, and Claude. The gap between what benchmarks suggest and what happens when you give these models something to ve…

  19. Does the Spending tab on the dashboard not update in real-time? It says I have 0% API usage, but today I only used gpt-5.4-medium, which I believe should count toward it.

  20. Most people are focused on the changes in the usage limits of Codex with the new Pro and Plus plans, but has anyone experienced changes to the Pro model on ChatGPT using the $200 vs $100 plan? I used to use the $200 Pro plan and used the P…

← all threads