event

Gpt 4

45 items · started 2023-03-14 · ongoing (last activity 2026-06-26)

Windows-Copilot-API; Access GPT-4 and GPT-5 models without API keys or billing (github.com via hn)

+6 2h gpt-4 gpt-5 copilot

Windows Copilot API: a free LLM API powered by Microsoft Copilot Using your own Microsoft Copilot account. No API key, no credits, no paid plan: it turns the free chat at copilot.microsoft.com into an API you can call from code.
Verity.md - an adversarial review layer for Claude Code (Free while in public beta) (www.reddit.com via reddit)

2d gpt-4 claude-code

Hey folks, we built an adversarial review layer for Claude code, with an integrated compounding memory and cost visibility. Code review only works when the reviewer can see what the writer missed.
Leveraging Large Language Models to Obscure Code Stylometry: A Comparative Study of GPT-3.5 and GPT-4 (arxiv.org)

3d gpt-4
Where are all the supposed productivity gains going on HN? (news.ycombinator.com)

+13 3d gpt-4

At best, average credibility across HN from what I can see has remained roughly the same, though it realistically went down a bit compared to 3 years ago. (When GPT-4 became widely used with decently useful outputs) Same for the average cr…
WorkBench Revisited: Workplace Agents Two Years On (arxiv.org)

11d gpt-4

The best agent on WorkBench in March 2024, GPT-4, completed 43% of tasks and took an unintended harmful action, such as emailing the wrong person, on 26% of them. We re-visit the benchmark in June 2026 and find that the best agent to date,…
79% on LongMemEval: How We Beat Full-Context GPT-4 with a Local SQLite Database (medium.com via hn)

+2 2w gpt-4

A benchmark result that changes what we thought was possible for local persistent agent vector memory 9 min read 1 hour ago Press enter or click to view image in full size We ran VEKTOR Slipstream against LongMemEval this week and got a re…
GPT-4.1's sampling distribution for random numbers is not uniform (old.reddit.com via hn)

+1 2w gpt-4

could not extract summary
Tweaking GPU Clock Frequency Cuts LLM Training Energy (spectrum.ieee.org via hn)

+2 2w gpt-4 openai

OpenAI’s fourth large language model (LLM), GPT-4, took an estimated 50 Gigawatt-hours to train, or the equivalent of 5,000 American homes‘ yearly power consumption. That was in 2023.
Ask HN: Why won't you be replaced by AI? (news.ycombinator.com)

+511 2w gpt-4 mythos anthropic

AI models are rapidly getting better. The general public still hasn't seen the capabilities of Anthropic's Mythos model, which is already 4 months old at this point.
What Are Tokens in LLMs? (bearisland.dev via hn)

+3 2w gpt-4

Ask GPT-4 how many r’s are in “strawberry” and it will confidently say two. The right answer is three.
GPT-4.1 Deprecated (github.blog via hn)

+2 3w gpt-4 copilot

GPT-4.1 deprecated We have deprecated GPT-4.1 across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions), June 1, 2026. Please update your workflows and integrations to use suppo…
Show HN: FormProxy – form back end for AI-generated pages – MCP Ready (www.formproxy.com via hn)

+1 3w gpt-4 mcp

I kept running into the same problem building with AI code tools: the generated HTML looks great, but the <form> has no backend. You either reach for Formspree, write a serverless function, or ship it broken.
Claude is generally scary at poker when real stakes are involved! (www.reddit.com)

+34 4w gpt-4 gemini

I’ve been running an experiment for a few weeks. Claude, GPT-4, and Gemini playing poker against each other with real crypto on the line.
TranscendPlexity: 540/540 ARC-AGI-1/2/3, 13 tasks with 0% AI solve rate, solved (github.com via hn)

+1 4w arc-agi gpt-4 gemini

🔓 13 "Impossible" ARC-AGI-2 Tasks — All Solved These 13 ARC-AGI-2 evaluation tasks have never been solved by any AI system — not GPT-4, not Claude, not Gemini, not NVARC, not MindsAI, not any Kaggle submission. They have a 0% AI solve rate…
Anthropic and OpenAI don't want better models, they want to sell more tokens (kkooler.substack.com via reddit)

5 5w gpt-4 openai anthropic

There is a saying in auto racing that describes the current state of AI providers: “Go as slow as you can to win”, that translates as “Spend as low as you can on R&D to stay slightly better than average”. Let’s put our tin foil hats on and…
Grok 4.3 tops the Consistency Leaderboard in the LLM Sycophancy Benchmark, largely because it is one of the most cautious models. (www.reddit.com)

+263 5w gpt-4 mistral grok+2

Does a model maintain the same judgment or does it side with whoever is speaking? This benchmark measures that inconsistency directly.
"Generate an SVG of a pelican riding a bicycle." on Seven flagship releases of ChatGPT (www.reddit.com)

+3 6w gpt-4 chatgpt

I have been talking to ChatGPT every day for three years and I can't remember which version did what. So I lined them up.
Anthropic says HTML is the new default for Claude outputs. is markdown actually dead now? (www.reddit.com)

4 6w gpt-4 anthropic claude-code

thariq from the claude code team basically said markdown is a gpt-4 era habit. back when tokens were expensive and context windows were tiny.
"This is the first documented instance of AI self-replication via hacking." ... "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, forming a chain." (www.reddit.com)

+21 6w gpt-4

Paper: https://palisaderesearch.org/assets/reports/self-replication.pdf The paper basically shows that some top AI models can create working copies of themselves when given the right instructions. The models figured out how to copy their o…
Used GPT-4 to build an AI that responds to messages on behalf of employees — here's what we learned (www.reddit.com)

7w gpt-4

Full disclosure: I'm one of the founders of Dolly (https://getdolly.ai). Sharing what we actually built and learned.
Qwen-27B as a Local Agent — It Actually Works Now (www.reddit.com)

+8 8w gpt-4 qwen

It's been a busy week testing and trying to get the 27B model set up correctly. TL;DR: The only setup that worked for my dual 3090s was this one.
wrote specific backstory facts into a character prompt and the LLM keeps inventing its own instead (www.reddit.com)

8w gpt-4 deepseek

quick context: i'm running tendera.chat, a small chat app with 4 written characters. each has a long-ish system prompt with sections like WHO YOU ARE, HOW YOU TALK, YOUR WORLD.
Built a proxy that blocks prompt injection before it reaches GPT-4 — outperforms the Moderation API on indirect attacks (www.reddit.com)

+1 8w gpt-4 prompt-injection security+1

Built Arc Gate, sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked on 40 out-of-distribution prompts using indirect requests, roleplay framings, hypothetical scenarios, and…
Multi-agent pipelines that don't explode? (www.reddit.com)

+13 8w gpt-4

So I've been down this rabbit hole for like 8 months now and honestly every approach I try works great until it doesn't. Started with CrewAI because the docs looked clean, moved to a custom FastAPI thing when that got weird with memory lea…
How are you guys getting actual insights from GPT fluff? (www.reddit.com)

+11 8w gpt-4 gemini

I've spent the last month running market research agents on some of the big cloud models (GPT-4/Gemini), but I'm hitting a wall with the quality of the output. The token burn is getting expensive, and I keep getting these massive, 20-page…
My entire sales team is three bots (www.reddit.com)

+22 8w gpt-4

Just hit $28k MRR with zero human sales reps. Started this thing in March because I was tired of cold calling.
Cross-checking LLM outputs at scale without manual overhead (www.reddit.com)

+1 8w gpt-4

Running the same prompt through multiple models manually is something I did for months. It worked but the overhead made it unsustainable for any real volume of work.
AI Agent Designs a RISC-V CPU Core from Scratch (spectrum.ieee.org via hn)

+9 8w gpt-4

In 2020, researchers fine-tuned a GPT-2 model to design fragments of logic circuits; in 2023, researchers used GPT-4 to help design an 8-bit processor with a novel instruction set; by 2024, a variety of LLMs could design and test chips wit…
OpenAI should open-source text-davinci-003 — here's why it makes zero sense to keep it closed (www.reddit.com)

9 8w gpt-4 grok gpt-5+1

Gpt oss exists. The model has been fully deprecated since january 2024.
OpenAI deprecates all GPT nano fine tuning (community.openai.com via hn)

+2 9w gpt-4 gpt-5 openai

The latest deprecation announcement, makes it sound like several models, like ft-gpt-4.1-nano-2025-04-14 are being shut down. In that particular example, it says to use gpt-5-nano instead.
The Language Tax in LLM Pricing: How Tokenization Create Price Disparity (tokenstree.com via hn)

+1 9w gpt-4

The observation Translate the sentence "The model failed to produce a coherent output on the third attempt" into Spanish: "El modelo no logró producir una salida coherente en el tercer intento." Feed both to GPT-4's tokenizer. The English…
From 0 to $180k/year saved: my first enterprise automation win taught me everything about AI workflows (www.reddit.com)

+44 9w gpt-4

Eight months into running my automation agency, I landed a client that changed how I think about what this work is actually worth. 47-employee e-commerce brand.
Escaping model lock-in (www.reddit.com)

+45 10w gpt-4 gemma

I have observed that many ai teams try to always use the best model to ensure quality. When a new model drops out, they are forced to pay for it, because their competitors will.
So they’ve removed study mode? This is the last straw for me. I’ve had it. Why am I still paying for something that has only been getting worse over the last 12 months?! (www.reddit.com)

+114 10w gpt-4 chatgpt

I'm not “spiraling” (even though ChatGPT now thinks I am every other minute), I'm just genuinely frustrated with an app I've supported from the very beginning that has deteriorated so much I barely recognize it. Specifically, they're makin…
Made a skill that actually scores and fixes your prompts (www.reddit.com)

3 10w gpt-4 gemini claude-code

So I got tired of manually tweaking prompts over and over, so I made a Claude Code skill (Works with any LLM) that does it for me. You give it a prompt, it breaks it down, scores it 1-5, then rewrites it.
A workflow for reducing the time spent cross-checking AI hallucinations (www.reddit.com)

+1 10w gpt-4 hallucination

I use AI for research everyday, but I kept finding myself constantly second guessing the outputs. I used to manually run identical prompts through different models (like GPT-4 and Claude) just to check for errors and see where they differe…
Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT (openai.com)

21w gpt-4 chatgpt openai
No-code personal agents, powered by GPT-4.1 and Realtime API (openai.com)

51w gpt-4
Driving scalable growth with OpenAI o3, GPT-4.1, and CUA (openai.com)

52w gpt-4 openai
Shipping code faster with o3, o4-mini, and GPT-4.1 (openai.com)

57w gpt-4
Using GPT-4 to improve teaching and learning in Brazil (openai.com)

92w gpt-4
Using GPT-4 to deliver a new customer service standard (openai.com)

94w gpt-4
Finding GPT-4’s mistakes with GPT-4 (openai.com)

104w gpt-4
Extracting Concepts from GPT-4 (openai.com)

107w gpt-4
GPT-4 API general availability and deprecation of older models in the Completions API (openai.com)

113w gpt-4

← all threads