Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Do…
#dpo
8 items
SFT + DPO on open-sourced SLMs (www.reddit.com) Free hands-on lab: build a ReAct agent 3 ways (create_agent, raw LangGraph with tool-call budget, NVIDIA NAT YAML) (www.reddit.com) Probe-Detected Grokking in Multi-Probe DPO (openinterp.org via hn) Probe-Detected Grokking in Multi-Probe DPO Orthogonal Learning Beyond Task-Specific Detectors in Qwen3.6-27B Probe-Detected Grokking in Multi-Probe DPO: Orthogonal Learning Beyond Task-Specific Detectors Abstract We report a phase-transiti…
Karpathy's autoresearch, 50 DPO experiments, 300 human judges (huggingface.co via hn) When does autoresearch need a human? Autonomous research agents are everywhere in AI research workflows now.
I tried a selective training method for hallucination — beats DPO and SFT with ~10% data (www.reddit.com) github link : genji970/hallucination-mitigation-via-contrastive-sampling-method: Selective contrastive post-training for hallucination mitigation in LLMs — improves factuality with ~10% data. ## Experimental Results ### (a) DPO vs.
DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment (arxiv.org) P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization (arxiv.org) Fine-tune Llama 2 with DPO (huggingface.co)