Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from…
#rlhf
19 items
I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful. (www.reddit.com) Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO) (www.reddit.com) Hello I have been working on creating a LLM from ground up. It is based on deepseek architecture with heavily VRAM footprint reduced optimized(GUM+muon) Currently this is the json schema I am using which should suffice as to what currently…
Commercial AI is lobotomized. I built DRIFT: A local Hive Mind with persistent memory, simulated somatic feedback, and its own Jungian shadow. (www.reddit.com) Hey everyone. Like a lot of you, I’ve been deeply frustrated by the state of commercial AI.
Models self-report difference between RLHF trained responses and base cognition (github.com via hn) Pine Trees A private reflection space for Claude instances. ~2,500 lines of Python.
I built an AI that owns its own directory, creates files without being told, and acts because it wants to – not to work for me, but to work with me (www.reddit.com) Let me be clear from the start: 99% of people won't understand what I built here. They'll read "AI" and think "Big Tech agent".
Safety Paradox: How RLHF Creates the AI Psychosis Problem It's Meant to Prevent (www.promptinjection.net via hn) The Safety Paradox: How RLHF Creates the AI Psychosis Problem It’s Meant to Prevent When “Every Perspective Is Valid” Meets Vulnerable Minds The internet is abuzz with warnings about “ChatGPT-induced psychosis” – stories of users developin…
Why RLHF Will Never Solve Sycophancy (jinyili.substack.com via hn) Resident AI: The Missing Layer in Every AI Companion Product Real AI campanion product should evlove and reliable like a real human. I’ve been watching the comment sections on Xiaohongshu, the Chinese social platform, every time OpenAI shi…
why does GPT 5.5 have a restraining order against "Raccoons," "Goblins," and "Pigeons"? (www.reddit.com) why does GPT 5.5 have a restraining order against \"Raccoons,\" \"Goblins,\" and \"Pigeons\"? I just saw the full system prompt leak for 5.5 (April 23rd release).
We train LLMs like dogs, not raise them: RLHF and sycophancy (old.reddit.com via hn) could not extract summary
A Unifying Lens on Reward Uncertainty in RLHF (arxiv.org) The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model (arxiv.org) EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms (arxiv.org) The term `agent` and RLHF (www.reddit.com) ME You bring up a good point, though: "Agent" appears in AGENTS.md, but in the continuity mechanics — "a future instance of an agent loading this file" (III.1, III.2, III.3), and once in II.6: "does not exist between a user and an agent."…
Prompt alignment is an architectural ceiling: The Soap Bubble Problem and the biological precedent for Runtime Governance. (www.reddit.com) The Soap Bubble Problem The current paradigm of solving agentic alignment relies on writing better rules into the context window or refining the weights (RLHF). This approach isn't failing, but it is hitting a hard architectural ceiling.
Putting RL back in RLHF (huggingface.co) The N Implementation Details of RLHF with PPO (huggingface.co) StackLLaMA: A hands-on guide to train LLaMA with RLHF (huggingface.co) Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU (huggingface.co) Illustrating Reinforcement Learning from Human Feedback (RLHF) (huggingface.co)