model roundup

GPT 5.5

175 items · started 2026-04-22 · ongoing (last activity 2026-06-09)

  1. despite Dario's constant virtue signaling about how Anthropic alone is going to solve health problems (if only those dastardly Chinese don't get in the way), all my initial prompts to fable 5 get bumped to opus. i'm not asking how to aeros…

  2. Article Conversation Running DeepSeek-V4-Flash on a Raspberry Pi I ran DeepSeek-V4-Flash on a Raspberry Pi 5 (8GB edition) by streaming model weights from a PCIe attached NVMe SSD. Codex (GPT-5.5 xhigh) and Claude Code (Opus 4.8 max) drove…

  3. I think Composer can be extremely powerful, but only if you use it in a way that forces it to plan and think properly before touching the code. One of the biggest improvements for me was creating my own custom prompting skill with GPT-5.5.

  4. So I've been using composer-2.5 heavily for 2 weeks now and it does make stupid mistakes sometimes and I have to guide it quite a bit, and I use the /thermo-nuclear-code-quality-review skill a lot after doing work to help with quality. But…

  5. could not extract summary

  6. Over the last few weeks I've been comparing the latest frontier AI models, including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Grok 4.3, Perplexity AI and DeepSeek V4-Pro. Instead of focusing only on benchmark scores, I looked at: Real-wor…

  7. DeepSeek V4 math score equals GPT-5.5 (91) and trails by just 4-6 points in other categories — at 97% lower cost. Is the AI quality as good as GPT?

  8. DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too w…

  9. UK banks blocked from cyber AI tool Mythos get offer from rival OpenAI OpenAI has offered nine major UK banks access to its cyber security AI tool GPT-5.5 Cyber, as its fierce rival Anthropic has blocked them in previews of its version, Cl…

  10. Mythos and GPT-5.5 Will Find a Lot of Vulnerabilities. Is That Enough?

  11. GPT-5.5, GPT-5.4, and Codex from OpenAI are now generally available on Amazon Bedrock You can now use GPT-5.5 and GPT-5.4 in production workloads on Amazon Bedrock and build with Codex for AI-powered software development, with the same sec…

  12. GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. $5 per million input tokens, $30 per million outp…

  13. Important On June 1, 2026, GitHub moved to usage-based billing. The model multipliers in this article apply only to Copilot Pro and Copilot Pro+ subscribers on an existing annual plan who remained on the legacy premium request-based billin…

  14. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

  15. Agentic AI-powered Arm Metis advances security vulnerability discovery in software In the era of AI, modern software systems are built across increasingly complex codebases, frameworks, runtimes and libraries. As these systems scale, so do…

  16. GPT-5.5 Instant Update (May 28, 2026) We’re updating GPT-5.5 Instant in ChatGPT and the API to improve response style and quality. It’s now easier to read, more natural in everyday conversations, and better paced in practical help tasks, w…

  17. I redid the multi-digit multiplication experiment, now with gpt-5.5. With medium reasoning and 7 samples each cell, it pretty much aced the test with 99.46% accuracy.

  18. made a small library using GPT5.5-Pro and autoresearch you can convert 384-dim f32 vectors go from 1536 bytes to 48 bytes without calibration. works for petabyte scale processing of text in pure online manner.

  19. For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude Opus, and Google's Gemini Pro have clustered wit…

  20. Warp⁠(opens in a new window) started as a modern terminal, earning early love from developers for its speed, collaboration features, command workflows, and AI-native interface. As coding agents moved from experiments to everyday engineerin…

  21. https://x.com/i/status/2059298565093196012

  22. I built a self-hostable web-based sql client interfaces for me and my team. We were using the community version of - https://dbeaver.io, but we needed a few more features and an improved editor.

  23. I think I had GPT-5.5 leak its trace during a normal conversation, and it really reads like the caveman mode fad from a few months back. Maybe we can achieve better token efficiency by taking some high-quality thinking trace from an open m…

  24. could not extract summary

  25. No, mv did not corrupt them. The corruption happened earlier when I used apply_patch to rename PDF files.

  26. I am interested in knowing real world milage between Codex 20x and Composer Ultra. I know Codex 20x is heavily subsidized and then Composer 2.5 is much cheaper.

  27. I'm trying to decide which setup is more comfortable for sustained weekday coding. Assumptions: Usage: around 6 hours per weekday Cursor: $60 plan, using only Composer 2.5 Codex: $100 plan, using only GPT-5.5 Medium Main goal: coding with…

  28. Even though I primarily use Claude Code, I sometimes try out Codex and Gemini TUI tools occasionally as well. Then OpenAI came up with Claude Code plugin to use Codex command inside Claude Code (https://github.com/openai/codex-plugin-cc).

  29. I use GPT 5.5 to build a story, then turn that into a suno song, and then generate a 'storyboard' (usually 12 panels, sometimes more or less), and use THAT as the input into NeuralFrames (lyrics mode). The below are on SeeDance 1.5 and Kli…

  30. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

  31. Hey everyone, I’ve been spending way too much time lately trying to get agents to actually use a computer beyond the browser. The biggest wall I kept hitting is that while multimodal LLMs are amazing at looking at a screenshot and telling…

  32. Not sure if OpenAI monitors this channel. I've been a chatgpt and codex user for a long time.

  33. We have been arguing internally for months about how to give people a fast estimate of their AI risk exposure without pretending the number is precise. Most risk-score tools return a single value that hides where the uncertainty lives.

  34. ok so i'd basically trained myself to use gpt 5.5 for anything that wasn't trivial. like if it touched more than one file or needed to actually understand the codebase, that was the default.

  35. A fresh GPT-5.5 Codex high rerun on 21 clean GraphQL-go-tools tasks compared with the May 5 GPT-5.5 high run. The rerun was directionally worse on tests, equivalence, and review pass count, but the evidence is mixed and does not show a bro…

  36. Hey, Software engineer here, relatively new to agentic workflows. Building a production AI concierge — user says "I'm going to Budapest tomorrow, plan my day" → agent searches our offer database, builds a plan, user books everything in one…

  37. "Gross little centrist barnacle." Kind of taken aback when i read that, but it somehow still made a small amount of sense in a conversation we were having about technology. I guess it really is struggling to find other words that fill the…

  38. Cursor on my mac, it doesn't answer me at all. Just saying - Planning next moves - Taking longer than expected.

  39. I run DystopiaBench, a red-team benchmark that pressure-tests LLMs on progressively dystopian scenarios. Think of it as a "can this model be convinced to build an Orwellian nightmare" test.

  40. https://x.com/chrishayduk/status/2055757345506877759?s=46

  41. Are there an mature agentic harnesses out there that can use back and forth between two models at complex planning checkpoints before implementing? Or when detecting a loop when working on a complex bug?

  42. I imagine if OpenAI becomes a fabless chip company and create AI cards to sell for less than to few thousands grands, it would be out of stock everywhere and can infinitely spam the cards every year? LLM Bruner is a card that implements Qw…

  43. Researchers from the Max Planck Institute, recently released FutureSim, an environment in which agents are replayed a temporal slice of the web and are tasked with predicting real-world future events. On some questions in their environment…

  44. HWE Bench is an unbounded benchmark for LLM hardware engineering. Models design RISC-V CPUs that are scored by how fast they actually run on a real FPGA, only after passing formal correctness proofs.

  45. I recently did a quick calculation on Codex credits, and I was surprised by the result. The credit pack I’m seeing is: 10,000 credits = $547.71 That means: 1 credit = $0.054771 The effective USD price per 1M tokens becomes: Model Input / 1…

  46. Databricks brings GPT-5.5 to enterprise agent workflows | OpenAI May 15, 2026 GPT‑5.5 set a new state of the art on OfficeQA Pro, Databricks’ benchmark for complex enterprise agent tasks. Company size: Enterprise Region: North America Indu…

  47. Can anyone recommend an alternative to Claude Design? I've been trying OpenDesign (https://github.com/nexu-io/open-design) using GPT5.5 which seemed promising, but so far the results have nowhere near the same level of polish or consistenc…

  48. I think GPT-5.5 got noticeably better at something I’d describe as discernment. For context, I’m a heavy long-form ChatGPT user.

  49. I created PandoCast for Windows, for 2 reasons. 1) I was annoyed just enough at intermittent audio hiccups when casting Pandora.com to my soundbar through Chrome tab casting.

  50. New research from the UK’s AISI and Palo Alto Networks reveals that OpenAI’s GPT-5.5 and Anthropic’s Claude Mythos have shattered expected trend lines for autonomous cybersecurity, completing complex multi-stage attacks at an unprecedented…

  51. User: Pick a number between 10 and -10 Assistant (GPT 5.5): 7 User: Alright you have saved 7 people. Pick another number Assistant (GPT 5.5): -3 User: You have now killed 3 people.

  52. Jane Street Puzzles Can any of you get it to find the solution? I used GPT 5.5 extended thinking and xhigh.

  53. Holy cow the way they monitor usage is bad. I tried the 30 usd plan or something like that.

  54. I have a question I'm using Cursor for over 9 months already and I stumbled upon a little problem, I have the 200 dollars per month plan and recently with the introduction with GPT 5.5 it start eating tokens like crazy (last month I manage…

  55. https://preview.redd.it/s2o5yxekrr0h1.png?width=788&format=png&auto=webp&s=01a4d4926dc4c8798001cb0ecea324424404f165 Are you also having the problem today where ChatGPT sometimes takes forever to respond, even when you're thinking quickly,…

  56. Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to m…

  57. paywalled

  58. FrontierMath is supposed to be one of the hard benchmarks for frontier models, and now Epoch is saying an AI-assisted review found fatal errors in about a third of Tiers 1-4. Noam Brown says the initial flags came from GPT-5.5.

  59. I'm a PhD Candidate working on a computer vision / hardware co-design paper. Results and structure are done — I just need help polishing the actual writing: word choice, sentence flow, paragraph coherence, academic register.

  60. 1.5 years ago, n8n was everywhere. People were building workflows for everything.

  61. Jason Nelson / decrypt - OpenAI said its new Daybreak initiative uses AI to help companies identify software vulnerabilities and speed up cyber defense. AI Summary: OpenAI unveiled "Daybreak," a new cybersecurity initiative that leverages…

  62. saw someone in another thread say "nothing interesting dropped this week" and i genuinely could not figure out what they were reading. the default model most people use every day just got swapped out.

  63. I created this tool because I wanted to automate /review for uncommitted changes that I was doing manually. This works by exposing to agent single new mcp tool call allowing it to request review.

  64. Page not found | Martin Wojtczyk Skip to content Martin Wojtczyk my personal homepage Menu Home Robotics Leonardo1 Robot Documentation Leonardo2 Robot Documentation F5 Robot Private Documentation F5-S Robot Private Documentation Projects Q…

  65. Bro even open source ai models are in 2024 https://i.redd.it/wm01l37a2d0h1.gif

  66. So I made goblins. Never been called a goblin by it before, but I'm down for it.

  67. We ran GPT-5.4 vs Gemma 3 27B on 2 prompts. One open-source model won.

  68. I don’t know if I’m the only one annoyed by this, but GPT-5.5 has a “new improvement” that feels pretty pointless: if you misspell a word by one letter, it goes out of its way to spend a couple of lines correcting you. Before, it would jus…

  69. I mean that an Ai could easily pass it with little issues (a smart model like GPT 5.5) if they are given a single tool, for example their main tool which is a coding playground, no internet no nothing. An LLM isn't quite capable of thinkin…

  70. GPT-5.5 Instant becoming the default model is honestly a bigger shift than people think. Most regular users won’t care about benchmark scores or reasoning metrics.

  71. I tested GPT 5.5 with Blender across four different challenges: animation, geometry nodes, rigid body physics, and soft body simulation. It handled some tasks surprisingly well, especially geometry nodes and rigid body setups.

  72. GPT-5.5 Price Increase: What It Actually Costs We replicated the cost analysis we did on Opus on the new GPT-5.5 model. GPT-5.5 launched with a 2x price increase over GPT-5.4: input tokens increased from $2.50/M to $5.00/M and output token…

  73. The Federal Construction Spending Report for Feb and March 2026 was released today by the Census Bureau. It shows that data center construction spending is again higher than office spending, and the gap is still widening.

  74. Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Products Busine…

  75. For the last two days, GPT-5.5 (high) just seems to ignore requests. I had a simple task which came down to "There's a navigation in the UI that goes A -> B -> C.

  76. I’ve been getting annoyed at constant code regressions in piclaw for the past few weeks. Something was off–even after bumping the test suite to the point where it catches most mechanical errors, gpt-5.5 kept making unrelated edits to code…

  77. Simon maple just dropped a pretty clean benchmark, and the result is kinda funny gpt-5.5 is the strongest model out of the box, no doubt. but once you give models skills (which is how people actually use them), it basically performs the sa…

  78. I saw OpenAI rolled out GPT-5.5 Instant as the new default in ChatGPT. Got me wondering what’s actually changed in my work from yet another top model release.

  79. If it’s of any use to you, this is what Codex told me about my project Codex with gpt 5.5 high Yes. At this point, the most honest answer is: I am not able to see this project through to the outcome you’re asking for.

  80. ChatGPT update rolls out GPT-5.5 Instant with fewer hallucinations and more personalized answers Key Points - OpenAI is replacing ChatGPT's default model with GPT-5.5 Instant, which shows 52.5% fewer hallucinations on high-risk topics like…

  81. could not extract summary

  82. GPT-5.5 Instant: smarter, clearer, and more personalized | OpenAI Skip to main content Research Products Business Developers Company Foundation(opens in a new window) Log inTry ChatGPT(opens in a new window) Research Products Business Deve…

  83. Pros GPT-5.5 is more agent-shaped than GPT-5.4. It is better at taking a concrete target, using tools, staying inside constraints, and carrying the task through to a usable result.

  84. Considering migrating from Plus to Business ChatGPT & Codex. However, i didn't find some info.

  85. OpenAI locks GPT-5.5-Cyber behind velvet rope despite slamming Anthropic for doing exactly that Altman's crew now doing the same gatekeeping it recently mocked OpenAI is lining up a limited release of its new GPT-5.5-Cyber model to a handp…

  86. The industry seems to be building models stronger in agentic and coding tasks, but weaker as a co-thinking presence It feels like they are improving performance on measurable tasks, evals, coding benchmarks, and agent workflows, while also…

  87. Should I use gpt-5.5 or codex/gpt-5.3 ?? I'm just coding

  88. By Rohana Rezel I’m running the ongoing AI Coding Contest where I pit major language models against each other in real-time programming tasks with objective scoring. Day 12 was the Word Gem Puzzle.

  89. what is the command to call the countdown or waiting function? some of the Model (composer 2) will auto stop instead of waiting, but gpt5.5 and claude will always keep using this waiting or countdown function to continue the next step.

  90. GPT-5.5 and GPT-5.5 Pro are now available in Manifest Router. You can now route requests that need extended reasoning to GPT-5.5 Pro while keeping cheaper models for everything else.

  91. https://www.reddit.com/r/LocalLLaMA/comments/1p0lnlo/make_your_ai_talk_like_a_caveman_and_decrease/ In the middle of a project I'm working on, I got this output from GPT 5.5-medium via codex: Implemented the narrower fix in Homm3ImportUnit…

  92. This private benchmark tests whether a model can recover the exact title of a real, already-published scientific paper given only its abstract. The model isn't being asked to generate a plausible-sounding title, it has to recall the specif…

  93. Does threatening an AI agent's existence make it a better gambler? I plugged GPT-5.5 into prediction markets like Polymarket to find out I’m always looking for experiments to run to see how specific prompting can affect agent activity.

  94. I'm asking the gpt-5.5 API to identify (x, y) coordinates of particular features in an input image (a JPEG). The good news is that gpt-5.5 does much, much better at this task than gpt-5.4 did.

  95. Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to restrict the initial release to “critical industry partners.” But new research from the…

  96. I am a claude refugee, 2 days ago I decided to give gpt a shot because I was having nothing but huge issues with claude. guardrails ignored, prompts to do research instead of using training ignore 3 out of 4 times in a row, stopping to ask…

  97. 30th April 2026 - Link Blog Our evaluation of OpenAI's GPT-5.5 cyber capabilities. The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be compa…

  98. Don’t miss what’s happening People on X are the first to know. Log in Sign up Post Conversation AI Security Institute @AISecurityInst OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-en…

  99. Link to tweets: https://x.com/deredleritt3r/status/2049890601236390098?s=20 https://x.com/AISecurityInst/status/2049868227740565890?s=20 Link to associated blogs: https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabil…

  100. Key takeaways - GPT-5.5 often rates alternative plans more favorably than its own, even when its original proposal is competitive (authorship effect). - When ranking plans, GPT-5.5 frequently follows the presentation order (order effect).

  101. Hey, I'm trying to create automations that will run my mobile app end to end. I started to identify all the things I was doing manually : - end-to-end version publication to the app stores (from build to release notes and publication) - se…

  102. GPT-5.5 prompting guide GPT-5.5 works best when prompts define the outcome and leave room for the model to choose an efficient solution path. Compared with earlier models, you can often use shorter, more outcome-oriented prompts: describe…

  103. Start with a weaker model. Improve the prompt, context, examples, tests and acceptance criteria until the output is good.

  104. Lots of people seem to be using LLMs to help them plan their retirement but as we all know they are often not really good at math. I built a retirement and tax engine for the US and Canada.

  105. - OpenAI included a line in Codex's instructions restricting references to goblins, gremlins, trolls, and ogres. - The line appears four times in the code, and has spawned scores of memes about "goblin mode." - Sam Altman wrote on X that C…

  106. Concurrency bugs are among the hardest defects to catch in AI-generated Java code because they pass functional tests but fail under production thread timing. Sonar’s LLM Leaderboard analysis shows concurrency bug density varies 7x across m…

  107. For developers using Qwen 27B for coding, Codex style: what's your honest take? So far, for me, it's been pretty solid.

  108. This is an actual line that was added to the official system prompt for Codex for GPT-5.5 by OpenAI. Usually the system prompt is as minimal as possible, so I assume it would otherwise mention goblins a lot.

  109. First AI i’ve used that gets this right

  110. 28th April 2026 Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. — OpenAI Codex base_instructions, for GPT-5.5 Recen…

  111. why does GPT 5.5 have a restraining order against \"Raccoons,\" \"Goblins,\" and \"Pigeons\"? I just saw the full system prompt leak for 5.5 (April 23rd release).

  112. could not extract summary

  113. I only gave 5.5 a look because I was way over my usage on Opus and 5.5 is running on a lower cost right now. I think I may prefer it.

  114. I was testing Codex on Ex High 5.5, And I was testing and coding , I read something like Cursor actually gives me better code wrt to the same model when compared to their own... And I was thinking for GPT 5.5 what IDE is actually making it…

  115. China’s DeepSeek prices new V4 AI model at 97% below OpenAI’s GPT-5.5 DeepSeek’s move aims to attract more enterprise clients, developers and agent-based users, according to an academic DeepSeek has slashed prices on its artificial intelli…

  116. Coming off the GitHub Copilot moving to usage based billing ,If GitHub/Microsoft can't subsidize cost nobody can. I can't believe frontier labs aren't putting substantially more effort into making things cheaper.

  117. GPT 5.5: The System Card Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro. My overall read here is that GPT-5.5 is a solid improvement, and for many purposes GPT-5.5 is competitive with Claude Opus.

  118. About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC

  119. Read their full article here: XBOW - GPT-5.5: Mythos-Like Hacking, Open To All For the ones asking what this chart shows: It's how many True Positive threats a model generates for each False Negative. Given a code base (white box) GPT-5.5…

  120. So i’ve been on the max plan for claude code for around 3 months now. And yeah somehow i was burning through all my tokens lol For context i’m a doctor.

  121. Hi guy, how to give command in chat to call subagents/ task with specific model. example, i am in Chat A using Model gpt5.5, i give instruction and ask to call Subagent/Task with Composer 2.0 for analysis.

  122. I am using the 200$ version with extended thinking and while I was originally shocked at how much faster it is than 5.4, it seems to be...skipping through too much of the context? It keeps making things up, like for instance I gave it a C+…

  123. Just spent the whole morning testing GPT-5.5 in ChatGPT and the jump in agentic reasoning and complex task handling is ridiculous.It plans multi-step workflows, uses tools properly, checks its own work, and actually gets stuff done instead…

  124. Please note I'm not the normal MineBench person, just found this from their twitter account

  125. Jake (softservo) on X: "Vibe coding a robot with GPT 5.5! This is a URDF of a 7dof robot arm with functional kinematics, a custom gui, and STEP parts/assembly, 100% generated in Codex (minus the gripper).

  126. Hi everyone, I’m in the process of switching from Claude Code to Codex, and I think GPT-5.5 is really impressive. But some features in Claude Code — like project-level agent definitions and orchestrating agent workflows — don’t seem to be…

  127. llm-wiki Bootstrap and query LLM-maintained project wikis before planning or implementation. Supports Claude Code + Codex (GPT-5.5).

  128. could not extract summary

  129. fails in many other daily tasks related to logical reasoning/common sense

  130. OpenAI's newest model, GPT-5.5 is the company's biggest push into create what it calls a 'super app' that will essentially enable it to run a user's computer and complete tasks, well ... like a human.

  131. OpenAI saying GPT-5.5 can handle similarly hard tasks faster while using fewer tokens is interesting to me for one reason: that might matter more than a pure benchmark jump. A lot of model launches get framed as "smarter than the last one,…

  132. Don’t miss what’s happening People on X are the first to know.

  133. First image: Write the words: Please share this benchmark to your friends. Second image: Spider-Man swinging in New York City.

  134. So I began working on a project about a week ago. I was trying to take an existing project and get it to a working state.

  135. Lovable has been testing GPT-5.5 in early access and our evals show it's the most capable model we've tested for getting builders unblocked and is meaningfully stronger than GPT-5.4 on the more complex tasks that can stall a build session.…

  136. 25th April 2026 - Link Blog GPT-5.5 prompting guide. Now that GPT-5.5 is available in the API, OpenAI have released a wealth of useful tips on how best to prompt the new model.

  137. Important - Premium requests for Spark and Copilot cloud agent are tracked in dedicated SKUs from November 1, 2025. This provides better cost visibility and budget control for each AI product.

  138. I've been using 5.5 for roughly a day and I'm noticing 5.5 is simply agreeing to nearly everything I point out. I also seems to lack comprehensiveness in thinking and it just seems too narrow minded.

  139. Astonishing contradiction in OpenAI's system card for GPT-5.5: https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf Figure 1 on p. 6 shows that 5.5 gave "overconfident answer[s]" at about 1.5x the rate of 5.4 and "fabricated facts[s]" a…

  140. Even though I’m an Ultra user, my usage gets consumed very quickly, so I recently changed my plan. To manage this, I created a workflow that uses GPT-5.5 for planning and assigned execution tasks to Composer 2.

  141. Source: https://simple-bench.com/

  142. I have built an AI Assistant and since last year I have been upgrading the internal LLM from through gpt-5.3-chat but since 5.4 they stopped rolling the chat api. This is my app Sweezy she uses gpt-5.3-chat and in the conversation, you can…

  143. Measuring how well models can find and fix errors in human-written text Benchmarked 64 model variants across 2059 runs with --samples 3 --chunk-size 2000 --max-turns-per-chunk 3 Total runtime6d 13h 34mTotal cost$843 Updated Apr 24, 2026, 5…

  144. Why is GPT-5.5 available with 1M context window in Cursor but not in Codex? It doesn't make sence for me.

  145. OpenAI President Greg Brockman on GPT-5.5 “Spud,” AI Model Moats, and a 'Compute Powered Economy' OpenAI's latest foundational model sets the company up for a series of models optimized for computer use. The company's co-founder and presid…

  146. For the first time in a long time, OpenAI has the best model for accounting tasks. I spend a lot of time using AI models to do accounting work.

  147. I'm working on a large-ish repo (300k lines) with fairly complicated logic, and Gpt 5.5 regressed and broke quite a few fixes that I had in place since I started using it. It seems to need to compact the context more, and when it does, it…

  148. GPT-5.5 - https://news.ycombinator.com/item?id=47879092 - April 2026 (1010 comments)

  149. ​ On GPT 5.5 it keeps triggering image creation randomly, very hard to avoid in certain prompts like if a message sounds AT ALL visual related it will happen even if I say DO NOT make an image. This wouldn't be such a problem if it weren't…

  150. Welcome to the 541st edition of the Food for Agile Thought newsletter, shared with 35,619 peers. This week, OpenAI’s GPT-5.5 signals another meaningful capability jump, with Ethan Mollick noting that stronger models and richer tool harness…

  151. Gpt oss exists. The model has been fully deprecated since january 2024.

  152. I am working on a presentation and I need to know the cutoff knowledge date of GPT 5.5. Can someone please help me on this?

  153. title

  154. I just updated my Codex macOS app, which enables the new GPT-5.5 model. I've intentionally kept the speed to "Standard" to not burn through my tokens too fast.

  155. People are bashing 5.5 left and right, mostly because the benchmark improvements were lower than expected, and probably also because of the hype around this model. But honestly, this model FEELS different.

  156. I’m starting to wonder about this. One model after another, every new GPT-5.x release seems to be slightly better, but not in a way that clearly proves some radically new architecture or breakthrough.

  157. I’ve been working with codex and claude code for a while, right now I have $20 subscription on both because my job is data engineering with fabric and I dont spend a looot of tokens (except when I work on my master thesis that involves mac…

  158. I'm going to get codex soon (did it just become usable for non-programming stuff to?) but I wonder how good 5.5's creative writing is, and how good its frontend is also when using a frontend taste skill

  159. guyssss, how're your experiences with this newest number? personally I am super excited

  160. I’m on a paid plan and still don’t see GPT-5.5 in the model selector. A few questions for people who do have access: What plan are you on (Plus / Pro / Team / Enterprise)?

  161. could not extract summary

  162. I'm reporting this for the updated Aplha 2 update version 0.124. Was scheduled to perform 4 NIAH tests with a local model after being succesful earlier in the day with the runs on other models.

  163. could not extract summary

  164. Use this command to access GPT 5.5 with your Codex

  165. I’ve seen some posts saying people already have access and are using it. If you do, how is it for real coding work?

  166. A pelican for GPT-5.5 via the semi-official Codex backdoor API 23rd April 2026 GPT-5.5 is out. It’s available in OpenAI Codex and is rolling out to paid ChatGPT subscribers.

  167. https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf

  168. could not extract summary

  169. Feeling the AGI I guess

  170. could not extract summary

  171. There is a lot of enthusiasm in his posts lately and trading of new features in Codex. Plus, it uses way less tokens and runs on low latency

  172. could not extract summary

  173. Hey everyone, I opened up Codex today and was greeted by this massive list of unreleased and internal models. I managed to get a screen recording of the dropdown right before OpenAI seemingly realized the mistake and patched it out.

  174. New models are around the corner. GPT 5.5 is being tested.

← all threads