#gpt-4

45 items

Grok 4.3 tops the Consistency Leaderboard in the LLM Sycophancy Benchmark, largely because it is one of the most cautious models. (www.reddit.com) +263 5w

Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.

↯ Mistral ↯ Gpt 4 ↯ Gemini 3.5 gpt-4 mistral grok+2
AI Agent Designs a RISC-V CPU Core from Scratch (spectrum.ieee.org via hn) +9 8w

In 2020, researchers fine-tuned a GPT-2 model to design fragments of logic circuits; in 2023, researchers used GPT-4 to help design an 8-bit processor with a novel instruction set; by 2024, a variety of LLMs could design and test chips wit…

↯ Gpt 4 gpt-4
Qwen-27B as a Local Agent — It Actually Works Now (www.reddit.com) +8 8w

It's been a busy week testing and trying to get the 27B model set up correctly. TL;DR: The only setup that worked for my dual 3090s was this one.

↯ Gpt 4 ↯ Qwen 3.6 gpt-4 qwen
Windows-Copilot-API; Access GPT-4 and GPT-5 models without API keys or billing (github.com via hn) +6 2h

Windows Copilot API: a free LLM API powered by Microsoft Copilot Using your own Microsoft Copilot account. No API key, no credits, no paid plan: it turns the free chat at copilot.microsoft.com into an API you can call from code.

↯ Copilot ↯ Gpt 4 ↯ GPT 5 gpt-4 gpt-5 copilot
Ask HN: Why won't you be replaced by AI? (news.ycombinator.com) +511 2w

AI models are rapidly getting better. The general public still hasn't seen the capabilities of Anthropic's Mythos model, which is already 4 months old at this point.

↯ Gpt 4 ↯ Anthropic Mythos ↯ GPT 4 gpt-4 mythos anthropic
From 0 to $180k/year saved: my first enterprise automation win taught me everything about AI workflows (www.reddit.com) +44 9w

Eight months into running my automation agency, I landed a client that changed how I think about what this work is actually worth. 47-employee e-commerce brand.

↯ Gpt 4 ↯ GPT 4 gpt-4
Escaping model lock-in (www.reddit.com) +45 10w

I have observed that many ai teams try to always use the best model to ensure quality. When a new model drops out, they are forced to pay for it, because their competitors will.

↯ Gpt 4 gpt-4 gemma
What Are Tokens in LLMs? (bearisland.dev via hn) +3 2w

Ask GPT-4 how many r’s are in “strawberry” and it will confidently say two. The right answer is three.

↯ Gpt 4 ↯ GPT 4 gpt-4
Claude is generally scary at poker when real stakes are involved! (www.reddit.com) +34 4w

I’ve been running an experiment for a few weeks. Claude, GPT-4, and Gemini playing poker against each other with real crypto on the line.

↯ Gpt 4 ↯ GPT 4 gpt-4 gemini
"Generate an SVG of a pelican riding a bicycle." on Seven flagship releases of ChatGPT (www.reddit.com) +3 6w

I have been talking to ChatGPT every day for three years and I can't remember which version did what. So I lined them up.

↯ Gpt 4 gpt-4 chatgpt
79% on LongMemEval: How We Beat Full-Context GPT-4 with a Local SQLite Database (medium.com via hn) +2 2w

A benchmark result that changes what we thought was possible for local persistent agent vector memory 9 min read 1 hour ago Press enter or click to view image in full size We ran VEKTOR Slipstream against LongMemEval this week and got a re…

↯ Gpt 4 ↯ GPT 4 gpt-4
Tweaking GPU Clock Frequency Cuts LLM Training Energy (spectrum.ieee.org via hn) +2 2w

OpenAI’s fourth large language model (LLM), GPT-4, took an estimated 50 Gigawatt-hours to train, or the equivalent of 5,000 American homes‘ yearly power consumption. That was in 2023.

↯ Gpt 4 ↯ GPT 4 gpt-4 openai
GPT-4.1 Deprecated (github.blog via hn) +2 3w

GPT-4.1 deprecated We have deprecated GPT-4.1 across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions), June 1, 2026. Please update your workflows and integrations to use suppo…

↯ Copilot ↯ Gpt 4 gpt-4 copilot
"This is the first documented instance of AI self-replication via hacking." ... "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, forming a chain." (www.reddit.com) +21 6w

Paper: https://palisaderesearch.org/assets/reports/self-replication.pdf The paper basically shows that some top AI models can create working copies of themselves when given the right instructions. The models figured out how to copy their o…

↯ Gpt 4 ↯ GPT 4 gpt-4
My entire sales team is three bots (www.reddit.com) +22 8w

Just hit $28k MRR with zero human sales reps. Started this thing in March because I was tired of cold calling.

↯ Gpt 4 ↯ GPT 4 gpt-4
OpenAI deprecates all GPT nano fine tuning (community.openai.com via hn) +2 9w

The latest deprecation announcement, makes it sound like several models, like ft-gpt-4.1-nano-2025-04-14 are being shut down. In that particular example, it says to use gpt-5-nano instead.

↯ Gpt 4 gpt-4 gpt-5 openai
Where are all the supposed productivity gains going on HN? (news.ycombinator.com) +13 3d

At best, average credibility across HN from what I can see has remained roughly the same, though it realistically went down a bit compared to 3 years ago. (When GPT-4 became widely used with decently useful outputs) Same for the average cr…

↯ Gpt 4 ↯ GPT 4 ↯ GPT 4 ↯ GPT 4 gpt-4
GPT-4.1's sampling distribution for random numbers is not uniform (old.reddit.com via hn) +1 2w

could not extract summary

↯ Gpt 4 gpt-4
Show HN: FormProxy – form back end for AI-generated pages – MCP Ready (www.formproxy.com via hn) +1 3w

I kept running into the same problem building with AI code tools: the generated HTML looks great, but the <form> has no backend. You either reach for Formspree, write a serverless function, or ship it broken.

↯ Gpt 4 gpt-4 mcp
TranscendPlexity: 540/540 ARC-AGI-1/2/3, 13 tasks with 0% AI solve rate, solved (github.com via hn) +1 4w

🔓 13 "Impossible" ARC-AGI-2 Tasks — All Solved These 13 ARC-AGI-2 evaluation tasks have never been solved by any AI system — not GPT-4, not Claude, not Gemini, not NVARC, not MindsAI, not any Kaggle submission. They have a 0% AI solve rate…

↯ Gpt 4 ↯ GPT 4 arc-agi gpt-4 gemini
Built a proxy that blocks prompt injection before it reaches GPT-4 — outperforms the Moderation API on indirect attacks (www.reddit.com) +1 8w

Built Arc Gate, sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked on 40 out-of-distribution prompts using indirect requests, roleplay framings, hypothetical scenarios, and…

↯ Gpt 4 ↯ Security ↯ GPT 4 ↯ GPT 4 ↯ GPT 4 gpt-4 prompt-injection security+1
Multi-agent pipelines that don't explode? (www.reddit.com) +13 8w

So I've been down this rabbit hole for like 8 months now and honestly every approach I try works great until it doesn't. Started with CrewAI because the docs looked clean, moved to a custom FastAPI thing when that got weird with memory lea…

↯ Gpt 4 ↯ GPT 4 gpt-4
How are you guys getting actual insights from GPT fluff? (www.reddit.com) +11 8w

I've spent the last month running market research agents on some of the big cloud models (GPT-4/Gemini), but I'm hitting a wall with the quality of the output. The token burn is getting expensive, and I keep getting these massive, 20-page…

↯ Gpt 4 ↯ GPT 4 gpt-4 gemini
Cross-checking LLM outputs at scale without manual overhead (www.reddit.com) +1 8w

Running the same prompt through multiple models manually is something I did for months. It worked but the overhead made it unsustainable for any real volume of work.

↯ Gpt 4 ↯ GPT 4 gpt-4
The Language Tax in LLM Pricing: How Tokenization Create Price Disparity (tokenstree.com via hn) +1 9w

The observation Translate the sentence "The model failed to produce a coherent output on the third attempt" into Spanish: "El modelo no logró producir una salida coherente en el tercer intento." Feed both to GPT-4's tokenizer. The English…

↯ Gpt 4 gpt-4
So they’ve removed study mode? This is the last straw for me. I’ve had it. Why am I still paying for something that has only been getting worse over the last 12 months?! (www.reddit.com) +114 10w

I'm not “spiraling” (even though ChatGPT now thinks I am every other minute), I'm just genuinely frustrated with an app I've supported from the very beginning that has deteriorated so much I barely recognize it. Specifically, they're makin…

↯ Gpt 4 ↯ GPT 4 gpt-4 chatgpt
A workflow for reducing the time spent cross-checking AI hallucinations (www.reddit.com) +1 10w

I use AI for research everyday, but I kept finding myself constantly second guessing the outputs. I used to manually run identical prompts through different models (like GPT-4 and Claude) just to check for errors and see where they differe…

↯ Gpt 4 ↯ Hallucination ↯ GPT 4 gpt-4 hallucination
Verity.md - an adversarial review layer for Claude Code (Free while in public beta) (www.reddit.com via reddit) 2d

Hey folks, we built an adversarial review layer for Claude code, with an integrated compounding memory and cost visibility. Code review only works when the reviewer can see what the writer missed.

↯ Gpt 4 gpt-4 claude-code
Leveraging Large Language Models to Obscure Code Stylometry: A Comparative Study of GPT-3.5 and GPT-4 (arxiv.org) 3d

↯ Gpt 4 gpt-4
WorkBench Revisited: Workplace Agents Two Years On (arxiv.org) 11d

The best agent on WorkBench in March 2024, GPT-4, completed 43% of tasks and took an unintended harmful action, such as emailing the wrong person, on 26% of them. We re-visit the benchmark in June 2026 and find that the best agent to date,…

↯ Gpt 4 gpt-4
Anthropic and OpenAI don't want better models, they want to sell more tokens (kkooler.substack.com via reddit) 5 5w

There is a saying in auto racing that describes the current state of AI providers: “Go as slow as you can to win”, that translates as “Spend as low as you can on R&D to stay slightly better than average”. Let’s put our tin foil hats on and…

↯ Gpt 4 ↯ GPT 4 gpt-4 openai anthropic
Anthropic says HTML is the new default for Claude outputs. is markdown actually dead now? (www.reddit.com) 4 6w

thariq from the claude code team basically said markdown is a gpt-4 era habit. back when tokens were expensive and context windows were tiny.

↯ Gpt 4 ↯ GPT 4 gpt-4 anthropic claude-code
Used GPT-4 to build an AI that responds to messages on behalf of employees — here's what we learned (www.reddit.com) 7w

Full disclosure: I'm one of the founders of Dolly (https://getdolly.ai). Sharing what we actually built and learned.

↯ Gpt 4 ↯ GPT 4 gpt-4
wrote specific backstory facts into a character prompt and the LLM keeps inventing its own instead (www.reddit.com) 8w

quick context: i'm running tendera.chat, a small chat app with 4 written characters. each has a long-ish system prompt with sections like WHO YOU ARE, HOW YOU TALK, YOUR WORLD.

↯ Gpt 4 ↯ DeepSeek 3 gpt-4 deepseek
OpenAI should open-source text-davinci-003 — here's why it makes zero sense to keep it closed (www.reddit.com) 9 8w

Gpt oss exists. The model has been fully deprecated since january 2024.

↯ Gpt 4 ↯ GPT 5.5 gpt-4 grok gpt-5+1
Made a skill that actually scores and fixes your prompts (www.reddit.com) 3 10w

So I got tired of manually tweaking prompts over and over, so I made a Claude Code skill (Works with any LLM) that does it for me. You give it a prompt, it breaks it down, scores it 1-5, then rewrites it.

↯ Gpt 4 ↯ GPT 4 gpt-4 gemini claude-code
Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT (openai.com) 21w

↯ Gpt 4 gpt-4 chatgpt openai
No-code personal agents, powered by GPT-4.1 and Realtime API (openai.com) 51w

↯ Gpt 4 gpt-4
Driving scalable growth with OpenAI o3, GPT-4.1, and CUA (openai.com) 52w

↯ Gpt 4 gpt-4 openai
Shipping code faster with o3, o4-mini, and GPT-4.1 (openai.com) 57w

↯ Gpt 4 gpt-4
Using GPT-4 to improve teaching and learning in Brazil (openai.com) 92w

↯ Gpt 4 gpt-4
Using GPT-4 to deliver a new customer service standard (openai.com) 94w

↯ Gpt 4 gpt-4
Finding GPT-4’s mistakes with GPT-4 (openai.com) 104w

↯ Gpt 4 gpt-4
Extracting Concepts from GPT-4 (openai.com) 107w

↯ Gpt 4 gpt-4
GPT-4 API general availability and deprecation of older models in the Completions API (openai.com) 113w

↯ Gpt 4 gpt-4

← all tags