event

Fine Tuning

211 items · started 2019-09-19 · ongoing (last activity 2026-06-26)

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models (arxiv.org)

9h fine-tuning
Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization (arxiv.org)

9h fine-tuning

Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep, human-li…
Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean (arxiv.org)

9h fine-tuning

Large pretrained text-to-speech (TTS) models sound almost human for well-resourced languages, but much worse for languages that are rare in their training data. We study this quality gap for Khmer and Korean using VoxCPM2, a 2.4B-parameter…
Inherited Circuits, Learned Semantics: How Fine-Tuning Creates Evasion Vulnerabilities Invisible to Standard Evaluation (arxiv.org)

9h fine-tuning

LLMs fine-tuned for security classification are usually evaluated on held-out examples from the same distribution as their training data. We show that this can miss vulnerabilities introduced by fine-tuning itself: models can learn token-l…
Localizing RL-Induced Tool Use to a Single Crosscoder Feature (arxiv.org)

9h tool-use fine-tuning agentic

Fine-tuning through RL reshapes the internal representations of language models to enable agentic behaviors such as tool use, yet the mechanistic basis of these changes remains poorly understood. While RL substantially improves structured…
SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-Tuning (arxiv.org)

9h fine-tuning

While parameter-efficient fine-tuning (PEFT) typically targets attention projectors, its efficacy for tasks requiring sequential state accumulation remains under-explored. We examine if PEFT for such tasks can benefit from state space mode…
Tracing a silent-corruption bug in differentially private LoRA fine-tuning (imranahamed.substack.com via hn)

+1 16h fine-tuning

The DP-LoRA silent corruption: how 5 months of broken fine-tuning hid in plain sight How a device-placement ordering quirk between opacus, PEFT, and HuggingFace caused DP fine-tuning to silently break, and what to check in your own setup.…
FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation (arxiv.org)

1d fine-tuning

Vision-Language-Action (VLA) models are often constrained by the imitation ceiling imposed by sub-optimal data. While Reinforcement Learning (RL) fine-tuning can surpass this limit, it is notoriously sample inefficient.
Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel (huggingface.co)

1d fine-tuning

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel NVIDIA NeMo AutoModel is an open library part of the NVIDIA NeMo framework for building custom generative AI models at scale. NeMo AutoModel builds cleanly on top of v5, addi…
Bilevel Data Curation for LLM Fine-tuning: Offline Selection and Online Self-Refining Generation (arxiv.org)

2d fine-tuning

Supervised fine-tuning (SFT) datasets are critical to the downstream performance of large language models, yet they often contain low-quality or harmful question-response pairs. To improve SFT data quality, we develop a unified bilevel fra…
Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models (arxiv.org)

2d fine-tuning

Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across generation…
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models using Reinforcement Learning from Ranking Feedback (arxiv.org)

3d fine-tuning
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs (arxiv.org)

3d fine-tuning
The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model (arxiv.org)

3d fine-tuning
Priority-Aware Learning-Unlearning Correction for Dynamic Decentralized LoRA Fine-Tuning (arxiv.org)

3d fine-tuning
Structured Hyperedge Adaptation for Parameter-Efficient Fine-Tuning of Vision Transformers (arxiv.org)

3d fine-tuning
Fine-Tuning Large Language Models for Quantum Reasoning (arxiv.org)

3d fine-tuning
- MAGNIFIED: RL Fine-tuning of Multimodal Large Language Models for Motion Planning (arxiv.org)
Translating Inference-Time Control to Radiology Vision-Language Models: Activation Steering for Pneumonia Classification on Chest X-rays (arxiv.org)

3d fine-tuning

Inference-time engineering can alter model behavior without fine-tuning. However, its utility for improving diagnostic performance in medical vision-language models (VLMs) remains unclear.
Skin-Deep: A Geometric Diagnostic for Alignment Fragility in Large Language Model Representations (arxiv.org)

3d fine-tuning

Alignment tuning is meant to make harmful-request refusal robust, yet this safety behavior can be erased by a small set of benign fine-tuning examples. This is a deployment risk for open-weight models because a checkpoint can pass refusal…
Claude Opus 4.8 launched in May but says its training cutoff is Jan 2026. Am I understanding the cutoff vs launch gap correctly? (www.reddit.comhttps)

4d rlhf fine-tuning opus

Was debugging my TTS pipeline and doing some research on natural voice options, and Claude Opus 4.8 mentioned its training cutoff is January 2026. But the model launched on May 28, 2026.
Fine-Tuning and Deploying LLMs on Mobile:F/b to learnings (www.youtube.com via hn)

+1 4d fine-tuning

About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket © 2026 Google LLC
Meta Flow Maps enable scalable reward alignment (arxiv.org)

7d fine-tuning

Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function.
Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates (arxiv.org)

7d fine-tuning

Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedba…
AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models (arxiv.org)

7d fine-tuning

Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can…
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software (arxiv.org)

7d fine-tuning security

Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manuall…
Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR (arxiv.org)

7d fine-tuning

The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid…
Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices (arxiv.org)

7d fine-tuning

Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory during fin…
Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer (arxiv.org)

7d fine-tuning

We study cross-lingual transfer by fine-tuning seven large language models (4B--671B parameters) on Arabic and evaluating zero-shot reading comprehension on Semitic languages and non-Semitic controls. Across dense and Mixture-of-Experts ar…
SWE-Future: Forecast-Conditioned Data Synthesis for Future-Oriented Software Engineering Agents (arxiv.org)

8d fine-tuning

Realistic coding-agent benchmarks often replay public GitHub issues and pull requests, making them vulnerable to overlap with model pretraining, fine-tuning, synthetic-data generation, or benchmark-driven model selection. Fully synthetic t…
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection (arxiv.org)

8d fine-tuning

The increasing deployment of parameter-efficient fine-tuning (PEFT) has led to model ecosystems in which a single backbone is paired with many task-specialized adapters. In this setting, inference-time queries often arrive without task lab…
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models (arxiv.org)

8d fine-tuning

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expandin…
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs (arxiv.org)

8d fine-tuning

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contra…
Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning (arxiv.org)

8d fine-tuning

Pre-trained materials foundation models, or machine learning interatomic potentials, leverage general physicochemical knowledge to effectively approximate potential energy surfaces. However, they often require domain-specific calibration d…
Beyond LoRA: Can you beat the most popular fine-tuning technique? (huggingface.co)

8d fine-tuning

Beyond LoRA: Can you beat the most popular fine-tuning technique? When you plan to fine-tune a model in a parameter-efficient way, think beyond LoRA If you want to fine-tune an open model on your own data, you are probably interested in so…
X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation (arxiv.org)

9d fine-tuning

AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment.
Gradual Fine-Tuning for Flow Matching Models (arxiv.org)

9d fine-tuning

Fine-tuning flow matching models is a central challenge in settings with limited data, evolving distributions, or computational constraints. While recent work has produced significant advances, particularly in the area of reward-based fine…
Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue (arxiv.org)

9d fine-tuning

Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom monitoring at sca…
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation (arxiv.org)

9d fine-tuning

This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of language…
RepSelect: Robust LLM Unlearning via Representation Selectivity (arxiv.org)

9d fine-tuning

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot pro…
TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins (arxiv.org)

9d fine-tuning

Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and naïve runs can even degrade model performance. This raises a practical que…
A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction (arxiv.org)

9d fine-tuning

The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance prediction remai…
Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering (arxiv.org)

9d fine-tuning

Open-weight video diffusion models can generate photorealistic unsafe content, from violence to misinformation, yet existing defenses either require expensive safety fine-tuning that degrades general capability, or apply external filters t…
The Guide to Fine-Tuning LLMs (arxiv.org via hn)

+11 9d fine-tuning

This report examines the fine-tuning of Large Language Models (LLMs), integrating theoretical insights with practical applications. It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to t…
Could we use latent representations as internal safety checks during generation? (www.reddit.com via reddit)

9d fine-tuning anthropic

Reading Anthropic's work on emotion-like representations got me thinking. If we can identify latent representations for concepts such as fear, despair, etc., could similar methods be used to identify representations associated with malic…
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes (arxiv.org)

10d fine-tuning

When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce t…
Data-Centric Benchmarking of Exploit Generation in LLMs: Understanding the Impact of Fine-Tuning (arxiv.org)

10d fine-tuning security

We study the task of CVE-conditioned exploit generation, where a model drafts proof-of-concept (PoC) exploits given software vulnerability context. We adopt a data-centric approach, constructing a high-quality dataset via multi-stage prepr…
Conflict-Aware Federated Fine-Tuning of Large Language Models with Mixture-of-Experts (arxiv.org)

10d fine-tuning moe

The continuous scaling of large language models (LLMs) incurs prohibitive computational costs, making Mixture-of-Experts (MoE) a scalable alternative for efficient fine-tuning via sparse activation. While federated learning (FL) emerges as…
Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning (arxiv.org)

10d fine-tuning

Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the…
Mechanistic Analysis of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning (arxiv.org)

10d fine-tuning

Sequential fine-tuning of Large Language Models (LLMs) adaptation to target tasks often triggers catastrophic forgetting, where the acquisition of novel target skills degrades ancestral capabilities. This paper presents a systematic compar…
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents (arxiv.org)

10d fine-tuning security

Graphical user interface (GUI) agents powered by multimodal large language models (MLLMs) have shown greater promise for human-interaction. However, due to the high fine-tuning cost, users often rely on open-source GUI agents or APIs offer…
G-Loss: Graph-Guided Fine-Tuning of Language Models (arxiv.org)

10d fine-tuning

Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for t…
SDFLoRA: Selective Decoupled Federated LoRA for Privacy-preserving Fine-tuning with Heterogeneous Clients (arxiv.org)

10d fine-tuning

Federated learning (FL) for large language models (LLMs) has attracted increasing attention as a privacy-preserving approach for adapting models over distributed data, where parameter-efficient methods such as Low-Rank Adaptation (LoRA) ar…
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning (arxiv.org)

10d fine-tuning

Supervised fine-tuning (SFT) is a commonly used technique to adapt large language models (LLMs) to downstream tasks. In practice, SFT on a full dataset is computationally expensive and sometimes suffers from overfitting or bias amplificati…
Fine-Tuning a 7B Advisor on Free-Tier GPUs: An Adapter-Handoff Recipe and a Synthetic-Data Reliability Caution (arxiv.org)

10d fine-tuning

Fine-tuning a 7B language model for specialized advising is attractive in resource-constrained settings, but multi-epoch runs routinely exceed the wall-clock limits of the free-tier GPUs (Kaggle, Colab) such users rely on. We report two th…
SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation (arxiv.org)

10d fine-tuning

Large language models routinely generate code with exploitable security flaws. Prior literature attributes this limitation to a lack of security expertise, steering current defense mechanisms toward heavy fine-tuning or external knowledge…
PreLort: Prefix-Nested LoRA for Federated Fine-Tuning under Rank Heterogeneity (arxiv.org)

10d fine-tuning

Federated fine-tuning of large language models using parameter-efficient methods such as LoRA enables privacy-preserving adaptation of foundation models. Heterogeneous hardware resources introduce challenges, as clients with different adap…
LiteOdyssey: A Lightweight Reasoning AI Agent for Interpretable Rare-Disease Diagnosis (arxiv.org)

10d fine-tuning

Most medical AI systems improve by scaling additional machinery: more fine-tuning data, more agents, and/or larger retrieval databases. In rare-disease diagnosis, however, such scaling can produce systems that are difficult to deploy, audi…
Integrating Reasoning and Generalization in Text-to-SQL via Self-Enhanced Fine-Tuning (arxiv.org)

10d fine-tuning

Text-to-SQL aims to translate natural language questions into executable SQL queries over structured databases, enabling non-expert users to access data intuitively. While recent advances in large language models (LLMs) have shown promise…
Show HN: Does a vibe leak? Fine-tuning an LLM on an attitude it never states (github.com via hn)

+2 10d fine-tuning

Latent Bias Transfer (LBT) A note on how this was made. The hypothesis and the questions are mine — but several of the techniques here (LoRA fine-tuning, activation steering, the statistics) were new to me.
NeST: Neuron Selective Tuning for LLM Safety (arxiv.org)

11d fine-tuning

Safety alignment is essential for the responsible deployment of Large Language Models (LLMs). Yet, existing approaches often rely on heavyweight fine-tuning that is costly to update, audit, and maintain across model families.
Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback (arxiv.org)

11d fine-tuning

We study multi-domain LLM training in which two models, each stronger in a different domain, co-evolve by tutoring each other through on-policy feedback. Unlike one-way distillation or single-model fine-tuning, our goal is mutual Pareto im…
3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding (arxiv.org)

11d fine-tuning

Reinforcement Learning with Verifiable Rewards ( RLVR ) has emerged as a transformative paradigm for enhancing the reasoning capabilities of Large Language Models ( LLMs), yet its potential in 3D scene understanding remains under-explored.…
Rethinking the Trust Region in LLM Reinforcement Learning (arxiv.org)

11d fine-tuning

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipp…
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost (arxiv.org)

11d fine-tuning

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learni…
Dense Coordinate-List Fine-Tuning Induces a Controllable Interference Surface in Vision-Language Models (arxiv.org)

11d fine-tuning

Fine-tuning vision-language models to emit dense coordinate lists improves visual grounding but also changes how models serialize, repeat, and terminate structured outputs. We study this behavior as a generation and control surface.
PolyAlign: Conditional Human-Distribution Alignment (arxiv.org)

2w fine-tuning

Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress the natu…
Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study (arxiv.org)

2w dpo fine-tuning

We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our experimental results demonstrate that DPO simplifies the training pipeline, improves computatio…
Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization (arxiv.org)

2w mistral fine-tuning gpt-5

Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical claim verification, but cost and opacity limit scalable use. We fine-tune three small LLMs: Phi-3-mini (3.8B), Qwen2.5-3B, and Mistral-7B, vi…
MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection (arxiv.org)

2w fine-tuning

Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health natural la…
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning (arxiv.org)

2w fine-tuning rag

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a se…
Fantastic Scientific Agents and How to Build Them: AgentBuild for Rietveld Refinement (arxiv.org)

2w fine-tuning

As scientific workflows shift from deterministic executables to LLM-based agents, the development practices on offer, such as fine-tuning, reinforcement learning, and prompt-and-go, bury the scientist's judgment. We propose treating agent…
Parallelogram – catch fine-tuning dataset bugs before training (www.parallelogram.dev via hn)

+1 2w fine-tuning qwen openai

Open-source local CLI that validates OpenAI/Qwen chat JSONL and ShareGPT-style fine-tuning datasets — broken roles, empty messages, duplicates, encoding artifacts, context-window overflows — before they poison a training run.
Making a Vintage LLM from Scratch (crlf.link via hn)

+21 2w fine-tuning

Making a vintage LLM from scratch In this blog post, I will share the adventures I had creating my own LLM, from (almost) scratch, trained only on old texts. I made my own base-training and fine-tuning scripts, data processing pipelines an…
MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications (arxiv.org)

2w fine-tuning

Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platf…
AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin (arxiv.org)

2w fine-tuning

Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment dire…
Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning (arxiv.org)

2w fine-tuning

Fine-tuning large language models (LLMs) achieves strong performance but is often limited by the memory overhead of backpropagation. Zeroth-order (ZO) optimization avoids this overhead by estimating gradients through forward passes alone,…
ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing (arxiv.org)

2w fine-tuning

Domain fine-tuning degrades the safety of large language models: fine-tuned specialists readily comply with harmful prompts framed in domain language. Existing inference-time defenses that mix logits from a safe anchor model require both m…
Harness In-Context Operator Learning with Chain of Operators (arxiv.org)

2w operator fine-tuning

Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the model wi…
Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning (arxiv.org)

2w fine-tuning

Vision-Language-Action (VLA) models have demonstrated remarkable zero-shot generalization in robotic manipulation, yet the vast majority of pre-trained pipelines remain strictly confined to low-DoF parallel grippers. Adapting these rich se…
Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training (arxiv.org)

2w fine-tuning

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-…
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It (arxiv.org)

2w fine-tuning
Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning (arxiv.org)

2w fine-tuning
The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring (arxiv.org)

2w fine-tuning llama
A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design (arxiv.org)

2w fine-tuning
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning (arxiv.org)

2w fine-tuning
Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning (arxiv.org)

2w fine-tuning
Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction (arxiv.org)

2w fine-tuning

Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year Alzheimer's…
Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data (arxiv.org)

2w fine-tuning
Curvature-Guided LoRA: Matching Full Fine-Tuning in Function Space (arxiv.org)

2w fine-tuning
Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates (arxiv.org)

2w fine-tuning
AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification (arxiv.org)

2w fine-tuning
PriFT: Prior-Support Guided Supervised Fine-Tuning (arxiv.org)

2w fine-tuning
AlignFed: Alignment-Aware Asynchronous Federated Fine-Tuning for Large Language Models in Heterogeneous Edge Environments (arxiv.org)

2w fine-tuning
Self-Mined Hardness for Safety Fine-Tuning (arxiv.org)

2w fine-tuning
Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan (arxiv.org)

2w fine-tuning
FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning (arxiv.org)

2w fine-tuning
Ego-Pi: VLA Fine-Tuning for Ego-Centric Human and Robot Data (arxiv.org)

2w fine-tuning
Single-Cell Cross-Modal Transfer by Adversarial Fine-Tuning of Foundation Models (arxiv.org)

2w fine-tuning
Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER) (arxiv.org)

2w fine-tuning
A Mechanistic Analysis of Adversarial Fine-tuning of Vision Transformers (arxiv.org)

2w fine-tuning
Phantom transitions in language model fine-tuning (arxiv.org)

2w fine-tuning
How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions (arxiv.org)

2w fine-tuning llama

Financial transaction processing requires extracting structured merchant information from noisy, abbreviated bank transaction strings at scale. Our current production system, a LoRA-fine-tuned LLaMA 3.1-8B, achieves 96.95% F1 on this task,…
Show HN: We're open sourcing Superlog (YC P26), an autonomous monitoring tool (github.com via hn)

+21 2w fine-tuning

Hi HN! This is Arseniy from Superlog (YC P26).
Fine-tuning LLMs on 30M academic papers from ScholarAPI (scholarapi.net via hn)

+1 2w fine-tuning

Global academic literature at your fingertips. Reliable Google Scholar alternative for large-scale access to academic PDFs and metadata, with full-text search and bulk download.
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows (arxiv.org)

2w fine-tuning
RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning (arxiv.org)

2w fine-tuning
Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines (arxiv.org)

2w fine-tuning gemma
The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning (arxiv.org)

2w fine-tuning

Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathe…
SafeGene: Reusable Adapters for Transferable Safety Alignment (arxiv.org)

2w fine-tuning

Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful.…
Learn from Your Mistakes: Tree-Like Self-Play for Secure Code LLMs (arxiv.org via hn)

+1 2w fine-tuning

While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinfo…
Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning (arxiv.org)

3w fine-tuning
(Mis)generalization of Helpful-only Fine-tuning (arxiv.org)

3w fine-tuning
ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models (arxiv.org)

3w fine-tuning
Emotion-Aware Image Generation from Korean Diary Text via LLM-based Prompt Translation and LoRA Fine-Tuning (arxiv.org)

3w fine-tuning

T2I models cannot effectively capture sentiment from various types of text, including diaries, as they primarily focus on visual object-related patterns rather than contextual emotional understanding. This paper proposes an emotion-aware t…
Multilingual Fine-Tuning via Localized Gradient Conflict Resolution (arxiv.org)

3w fine-tuning

The rapid evolution of Large Language Models (LLMs) has established cross-lingual versatility as a defining feature of modern systems. However, fine-tuning these models frequently induces negative interference across languages.
Fine-tuning an LLM to write docs like it's 1995 (passo.uno via hn)

+1 3w fine-tuning

In my predictions for 2030 I wrote that tech writers would be using specialized LLMs, running locally on powerful hardware. I see hints of this move to “local first” among engineering pundits, but we’re not there yet, in part because of ho…
Fine-Tuning for Engagement (robertdruska.com via hn)

+1 3w fine-tuning

May 29, 2026 It’s been quite some time since major LLM providers introduced the behaviour that the chatbots often end their response with a question. The motivation is clear: more engagement, more data to train on.
losing my mind fine-tuning jina-v5 for a legal corpus (www.reddit.com)

+54 4w fine-tuning

For the last month i've been trying to fine-tune jina-v5 (which has performed best on my corpus out of the box) on slovak law chunks, time and time again no matter what i do I can't get the model to learn nuance of slovak syntax. here's th…
I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful. (www.reddit.com)

+8130 4w rlhf fine-tuning

Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from…
SkillOpt – Executive Strategy for Self-Evolving Agent Skills (microsoft.github.io via hn)

+21 4w fine-tuning

A skill is external state for an agent. Instead of fine-tuning a model or hand-maintaining prompts, SkillOpt runs the frozen agent on scored batches, asks a separate optimizer model to propose structured edits, and accepts a candidate only…
AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset (www.reddit.com)

+36 4w fine-tuning gemma qwen+1

I've fine-tuned Qwen 3.5 0.8B on the dataset provided by Pangram with their EditLens paper. It's available via a Chrome extension; you can just click selected text and it's going to give you the probability distribution of how likely it is…
What workstation to get for ~13k EUR? (www.reddit.com)

+28 4w minimax vllm fine-tuning+2

My use-cases will be to test open-weight LLMs and work on harnesses, inference systems and possibly other non-ML workflows (CS-related) in the future. Fine-tuning would not be something I do locally because I can rent a B200 from RunPod fo…
GPT 5.5 "secret sauce" is just having the thinking be some stupid caveman mode? (www.reddit.com)

+3835 4w fine-tuning gpt-5

I think I had GPT-5.5 leak its trace during a normal conversation, and it really reads like the caveman mode fad from a few months back. Maybe we can achieve better token efficiency by taking some high-quality thinking trace from an open m…
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models (www.computer.org via hn)

+1 5w fine-tuning

I.Introduction Transformer-based PLMs [1],[2],[3],[4],[5] have demonstrated remarkable performance across a wide range of NLP tasks. To fully harness the potential of PLMs, fine-tuning is commonly employed to adapt them to task-specific da…
I Kept a Diary for Seven Years. An LLM Finally Read It. (www.reddit.com)

+49 5w fine-tuning rag

I've kept a personal diary since 2019. Last week I fed 200+ entries to an LLM and asked it how I've changed over 7 years.
Number-aware embeddings (www.reddit.com)

+7 5w fine-tuning qwen

If you look at the cosine sim between the embeddings of "a 500 hp car", "a 1,200 hp car" and "a 73 hp car", you'll soon see that embedding models have no sense of number ordering at all. (I tested Qwen and ModernBERT-based embeddings) It m…
Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation (huggingface.co)

5w fine-tuning

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Motivation NVIDIA Cosmos Predict 2.5 is a large-scale world model capable of generating physically plausible videos conditioned on text, images, or video clips…
RAG vs. Fine-Tuning – The Question Every AI Builder Gets Wrong (thingswithai.org via hn)

+2 5w fine-tuning rag

RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong AI models don't know your private data.
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face (huggingface.co via reddit)

+125 5w fine-tuning agentic

Qwopus3.5-9B-coder is specially optimized and fine-tuned for high-performance 🤖 Agentic Coding, complex Tool Calling, and logical reasoning. 💡 Why the 9B Dense Model?
Personal continual learning for LLMs without GPU — position paper [OC] (www.reddit.com)

+12 5w fine-tuning

I proposed two architectures for enabling LLMs to learn daily from personal interactions: Internal KV-Sphere Architecture (IKSA) Background Micro Fine-Tuning (BMFT) Both work with zero GPU and zero catastrophic forgetting. Full paper: in c…
Liquid AI releases fine-tuning harness for AI agents (lqh.ai via hn)

+4 5w fine-tuning

Liquid Harness is an autonomous agent by Liquid AI that takes a plain-English spec and ships a fine-tuned Liquid Foundation Model. Spec, data, eval, training, deployment — all in one run.
About to start fine-tuning on RunPod. What should I know to not waste money? (www.reddit.com)

+17 6w fine-tuning

I was MLOps lead at an AI company managing 5000+ GPUs across GCP and CoreWeave. Left to start my own thing and now I'm back to renting GPUs like everyone else.
Dropping learning rate fixed my Qlora fine-tune more than anything else i tried (www.reddit.com)

+55 6w fine-tuning llama

Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data.
Fine-Tuning TranslateGemma-4B to improve bi-directional English & Welsh translations on an H200 GPU! (metalglot.com via reddit)

+62 6w fine-tuning

Open source repo: https://github.com/grctest/finetuned-gemmatranslate-cy 5% of the fine-tuning took 40 minutes and cost a couple dollars to prove the process works. Looking forwards to Flash Attention v4 to leave beta, to test fine-tuning…
Llama models: still valuable for finetuning or surpassed by everything new? (www.reddit.com)

+69 6w fine-tuning llama

Hello there people. So I have noticed that people are pretty much ignoring Llama 3 plus 3.1, 3.2, and 3.3 these days.
How to Fine-Tune LLMs on AMD Strix Halo (www.promptinjection.net via hn)

+1 6w fine-tuning

How to Fine-Tune LLMs on AMD Strix Halo (Ryzen AI MAX+ 395) and Other Exotic AMD Hardware A Complete Windows and Linux Guide to Full SFT and LoRA Training This guide covers full SFT and LoRA fine-tuning on AMD hardware that sits outside th…
Show HN: LLM post-training to speak like GenZ, costing less than a cup of coffee (github.com via hn)

+51 6w fine-tuning

GenZ LLM A post-trained language model that responds in GenZ slang, built on top of Qwen2.5-0.5B-Instruct using Supervised Fine-Tuning (SFT) followed by Reinforcement Learning with GRPO. The fine-tuned model is available on Hugging Face: a…
Open-sourced our MCP server for GPU workload execution looking for feedback (www.reddit.com)

+13 6w fine-tuning mcp agentic

Hey everyone I’m Jaguar, building Jungle Grid. We just open-sourced our MCP server for agentic GPU workload execution.
I drew the entire AI stack on one page... and it's mostly not models. (www.reddit.com)

6 6w fine-tuning

Most "AI progress" talk lives on one layer: models. Bigger model, smaller model, new benchmark, repeat.
Realistically, what is the best use of consumer hardware for AI? (www.reddit.com)

9 6w fine-tuning

I want to move past the "democratization" slogans. What is the most practical contribution consumer-grade hardware can make to the ecosystem right now?
Introducing AI finetuner, Source available and free Claude skill to fine tune your vibe coded UI with live preview (www.reddit.com)

+11 6w fine-tuning

Fine-tuning UI with AI right now: "Make the shadow softer." "Stronger." "No, less." "Go back." "A bit more." 17 messages later, you've spent more tokens than the shadow is soft. I built something that breaks the loop.
MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required (huggingface.co)

7w fine-tuning

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required The Idea Medical question answering is one of those tasks where the stakes are genuinely high. A model that confidently picks the wrong answer on a clinical MCQ isn't just wro…
OpenAI has announced they will be winding down fine tuning. (www.reddit.com)

+92 7w fine-tuning rag openai

Got an email today about the announcement. > OpenAI is winding down the fine-tuning API and platform.
How Unsloth and Nvidia made LLM training 25% faster on consumer GPUs (unsloth.ai via hn)

+8 7w fine-tuning

Fine-tuning is one of today's most computationally intensive workloads, and it continues to push hardware to its limits. NVIDIA GPUs are purpose-built for these workloads: they break complex problems into pieces and process them in paralle…
Model Spec Midtraining: Improving How Alignment Training Generalizes (alignment.anthropic.com via hn)

+1 7w fine-tuning

We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussing their Model Spec. This shapes how models generalize from subsequent alignment training.
Demo of fine-tuning Orpheus 3B on a TTS dataset in Transformer Lab (open source) (www.reddit.com)

+3 7w fine-tuning

I'm part of the team building Transformer Lab, an open source ML research platform. We put together a short demo of how to run text to speech training, which you can do on your own hardware using a Local provider.
Anthropic just published new alignment research that could fix "alignment faking" in AI agents here's what it actually means (www.reddit.com)

+13 7w fine-tuning anthropic

Anthropic's alignment team published a paper this week called Model Spec Midtraining (MSM) and I think it's one of the more practically interesting alignment results I've seen in a while. The core problem they're solving: Current alignment…
RAG vs. Fine-Tuning: Which AI Strategy Saves Your Team Time and Budget (lightrains.com via hn)

+2 7w fine-tuning rag

Two weeks before a Fortune 500 product launch, we told a client to scrap their fine-tuned model and rebuild with RAG instead. They lost eight weeks and $180K.
I have practically unlimited access to Opus and every other frontier model. I'd like to help contribute to a dataset. (www.reddit.com)

+57 7w fine-tuning opus

No, I won't tell you how. No this is not for anyone who is not already a proven contributor to the fine-tuning space.
Anthropic co-founder Jack Clark says AI is nearing the point where it can automate AI research (www.reddit.com)

+16558 7w fine-tuning anthropic

Import AI 455: AI systems are about to start building themselves. Jack Clark thinks there’s a ~30% chance by the end of 2027 and a ~60%+ chance by the end of 2028 that AI research becomes automated, with models eventually helping train the…
HELP - How to fine-tune an LLM to match academic writing style (www.reddit.com)

+11 7w fine-tuning

I've been using LLMs to help write my thesis, but the output feels dry and uses awkward phrasing (especially in translation). I'm looking to fine-tune an accessible LLM to better match natural academic writing in my language.
Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats (www.reddit.com)

+62 8w fine-tuning opus

https://huggingface.co/datasets/angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k A synthetic fine-tuning dataset created from Claude 4.6/4.7. 8,706 total examples all with reasoning.
Best open-weight model to run locally on 8x A100 80GB for generating teacher data? (www.reddit.com)

+212 8w fine-tuning llama

I have (free) access to a SLURM cluster with 8x NVIDIA A100 80GB GPUs (=640 GB VRAM) on a single task, and I want to run an open-weight model locally with llama.cpp for data generation, not coding. My use case is generating teacher data fo…
Can LLMs create lasting flashcards from readers' highlights? (memory-machines.com via hn)

+1 8w fine-tuning

We tested prompting, fine-tuning, RL, and grounded evaluation across ~1,500 labeled flashcards—and found models catch obvious misses but not plausible failures.
Learn, run and test Agentic AI on your browser for free! (Built with Claude Opus 4.7 in 2 days) (www.reddit.com)

+48 8w function-calling fine-tuning rag+4

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
- Run, Learn and test Agentic AI for free, on your browser! (Open AI Models are included) (www.reddit.com)
Show HN: I built a 2nd-order PyTorch optimizer for LLMs that runs on 16GB GPUs (news.ycombinator.com)

+22 8w fine-tuning

Hi HN, I'm Danilo. I've been struggling with the limitations of AdamW when fine-tuning LLMs locally.
Interactive playground to learn Agentic AI hands-on (Free) with Certification (www.reddit.com)

+12 8w function-calling fine-tuning rag+3

Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…
End-2-end tutorial on fine-tuning, the whole journey (docs.liquid.ai via reddit)

+131 8w fine-tuning

I put together a hands-on tutorial that takes you from problem framing to fine-tuning, step by step. I decided to build a wildfire prevention system that uses satellite images and a Small Vision-Language Model (LFM2.5-VL-450M) to extract r…
Research note: Fine-tuning experiments on CoT controllability (metr.org via hn)

+1 8w fine-tuning

We find that a small amount of fine-tuning on instruction following in the CoT generalizes to meaningful increases in CoT controllability on an out-of-distribution set of tasks. We fine-tune four reasoning models on small datasets of instr…
A weekend with LoRA on Gemma 4 E2B: instrumenting what fine-tuning changes (aiexplr.com via hn)

+1 8w fine-tuning gemma

Spent a week doing LoRA fine-tuning on Gemma 4 E2B (~5.1B total params, ~2B active in text decoder) for a narrow Python code-generation task. Bad outputs went from ~5% to 0% (greedy) and 1.5% (sampled) across 134 tests.
Three lessons from fine-tuning a 5B code assistant — bad outputs from 5% → 0% (www.reddit.com)

4 8w fine-tuning gemma

Spent a week doing LoRA fine-tuning on Gemma 4 E2B (gemma-4-e2b-it, ~5.1B total params, ~2B active in the text decoder) for a narrow Python code-generation task. Setup: Model: Gemma 4 E2B, bf16, language_model only (vision + audio towers f…
Show HN: ShadowPEFT – Centralized and Detachable Parameter-Efficient Fine-Tuning (github.com via hn)

+51 8w fine-tuning

Unlike LoRA and its variants, which inject trainable parameters directly into the weights of the Transformer, requiring tight coupling with the backbone. ShadowPEFT instead enhances the frozen large base model by adding a lightweight, cent…
Hardware choice (www.reddit.com)

4 8w fine-tuning

We want to set up the following: A Local LLM environment for AI development, used by multiple software developers Infrastructure for training Vision AI models Capabilities for AI model fine-tuning I’m currently struggling to decide between…
ServiceNow-AI/SuperApriel-15B-Instruct · Hugging Face (huggingface.co via reddit)

+276 9w fine-tuning

A 15B-parameter token-mixer supernet with 8 optimized deployment presets spanning 1.0× to 10.7× decode throughput at 32K sequence length, all from a single checkpoint. Derived from Apriel-1.6 through stochastic distillation and targeted su…
Show HN: ClickMVP – Deterministic full-stack code generation (no LLMs) (app.clickmvp.com via hn)

+1 9w fine-tuning claude-code

I've built software for clients for 38 years and kept hitting the same wall: weeks spent scaffolding the data layer and the Clean Architecture around it before any real work begins. I asked Claude to estimate how long it would take to gene…
Pioneer: Vibetune Your LLMs (pioneer.ai via hn)

+1 9w fine-tuning gemma

+30% avg accuracy lift on classification & extraction tasks vs. base Gemma ~7 days until your first auto-improvement run lands in production 0 lines of fine-tuning code you have to write, ever $0/retrain starting price.
Show HN: MemFactory: Unified Inference and Training Framework for Agent Memory (arxiv.org via hn)

+8 9w fine-tuning llama

Memory-augmented Large Language Models (LLMs) are essential for developing capable, long-term AI agents. Recently, applying Reinforcement Learning (RL) to optimize memory operations, such as extraction, updating, and retrieval, has emerged…
I Built a desktop app for generating LLM fine-tuning datasets — started it a week ago while learning FT (www.reddit.com)

+31 9w humaneval fine-tuning claude-code

Hey, I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app…
We open-sourced Chaperone-Thinking-LQ-1.0 — a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B that hits 84% on MedQA in ~20GB (www.reddit.com)

2 9w mmlu fine-tuning deepseek+2

Hey everyone, We just open-sourced our reasoning model, Chaperone-Thinking-LQ-1.0, on Hugging Face. It's built on DeepSeek-R1-Distill-Qwen-32B but goes well beyond a simple quantization — here's what we actually did: The pipeline: 4-bit GP…
An Alignment Experiment: Native LLM vs. Custom Engine on Classical Naming. The statistical inertia is real. (www.reddit.com)

1 9w fine-tuning
[Project] I benchmarked my custom 2nd-order optimizer against AdamW across 1M, 5M, and 10M parameters. Here are the raw test results and scaling laws. (www.reddit.com)

9w fine-tuning
LLM from scratch (32l) – Interventions: updated instruction fine-tuning results (www.gilesthomas.com via hn)

+1 9w fine-tuning
RTX PRO 5000 (48GB) vs MacBook Pro M5 MAX (128GB RAM) - The choice for fine-tuning & agentic coding (www.reddit.com)

+527 9w vllm fine-tuning llama+1
New Claude Opus 4.7 tell dropped (www.reddit.com)

2 9w fine-tuning opus
Which kind of base/fine-tunes have you done? And which data did you use? (www.reddit.com)

+1 9w fine-tuning qwen
Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? (www.reddit.com)

+3 9w fine-tuning
[Release] Swedish Construction FAQ — 503 bilingual (SV+EN) Q&As for fine-tuning, CC BY 4.0, now on HF / PyPI / Kaggle / Zenodo (www.reddit.com)

+11 10w fine-tuning

I've been building an open Q&A dataset for the Swedish construction industry (byggbransch) over the last few weeks — something that's been a gap in Swedish-language domain-specific datasets. Finally hit a milestone worth sharing.
Fine-tuning and deploying Gemma 4 is not that easy (ghost.oxen.ai via hn)

+4 10w fine-tuning gemma

Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…
Findings: Gemma4 26B-A4B fine-tuning on a single RTX 4090 — 10 patches, benchmark, PCIELink path #1 (www.reddit.com)

+22 10w fine-tuning

Summary of Findings This issue documents what we learned making Gemma4 26B-A4B-it train on consumer hardware (RTX 4090, 24GB VRAM). No A100.
Distilled my AI Agents and Skills definitions (www.reddit.com)

10w fine-tuning agentic

I have significantly distilled my AI Agents and Skills definitions. My goal is to reduce the context size and token usage without impacting the quality of my development team.
Show HN: Rollquation – A Rolling-Ball Math Puzzle Game for Android (Solo Dev) (play.google.com via hn)

+2 10w operator fine-tuning

Hey HN! I'm a solo dev and I just wanted to share my latest Android game — Rollquation.
DGX Spark users: What's the easiest way to do multi-node vLLM clustering with a browser UI and training? (www.reddit.com)

3 10w vllm fine-tuning openai

Hey r/LocalLLaMA, I've been running a small 4-node DGX Spark cluster on a 400µT fabric switch and got frustrated with the usual raw Ray/vLLM scripts and EXO basically ignoring pure NVIDIA paths. I started from the solid foundation in [eugr…
[D] Released a 100k-sample dataset on Hugging Face (www.reddit.com)

+196 10w fine-tuning

We’ve released a 100,000-sample Chain-of-Thought (CoT) dataset for fine-tuning local reasoning models. Each sample includes explicit intermediate reasoning traces, rather than answer-only supervision.
Friday, self-evolving assistant, only CC $100 plan, no agent framework (github.com via hn)

+11 10w fine-tuning claude-code

Friday — A 24/7 AI Assistant Built Entirely on Claude Code An always-on personal AI system using only Claude Code CLI ($100/month) and Telegram — no custom AI, no cloud VMs, no fine-tuning. Live page: missingus3r.github.io/friday-showcase…
A guide to model quantization in fine-tuning (and how to pick the right GGUF) (www.siquick.com via hn)

+1 10w fine-tuning

A guide to model quantization in fine-tuning (and how to pick the right GGUF) About this post Fine-tuning with Unsloth and Axolotl is, on the whole, a well thought-out experience where a lot of the complexity is handled for you. However on…
Curiosity about Chatterbox's architecture led me to fine-tune it for 8 Indian languages by LoRA, using 1.4% params (www.reddit.com)

10w fine-tuning

TL;DR: Fine-tuned Chatterbox-Multilingual for Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi using LoRA adapters + tokenizer extension. Only 7.8M / 544M parameters trained.
Gemopus: A Gemma fine-tune that prioritizes stability over long chain-of-thought (huggingface.co via hn)

+1 10w fine-tuning gemma

🌟 Gemopus-4-26B-A4B-it [!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first". While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for an…
I open-sourced media-tsunami — a tool that extracts your brand voice into a CLAUDE.md any LLM can load (www.reddit.com)

2 10w fine-tuning chatgpt

Your brand voice is probably a PDF nobody reads, or it's trapped in one founder's head, or it's scattered across a thousand ChatGPT histories. I wanted to treat it like code instead — a file you can version, share, diff, and plug into any…
Built a Japanese ASR benchmark because existing ones can't measure quality differences properly (www.reddit.com)

+9 10w fine-tuning

Was fine-tuning a Japanese ASR model (based on Qwen3-ASR) to handle technical terminology better. The model clearly improved — "Next.js" comes out as "Next.js" instead of "ネクストジェイズ", punctuation works, etc.
Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on (www.reddit.com)

+4416 10w fine-tuning

Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…
20x Faster TRL Fine-tuning with RapidFire AI (huggingface.co)

31w fine-tuning
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware (huggingface.co)

53w fine-tuning
Building smarter maps with GPT-4o vision fine-tuning (openai.com)

83w fine-tuning
Argilla 2.4: Easily Build Fine-Tuning and Evaluation Datasets on the Hub — No Code Required (huggingface.co)

85w fine-tuning
Introducing vision to the fine-tuning API (openai.com)

90w fine-tuning
Fine-tuning LLMs to 1.58bit: extreme quantization made easy (huggingface.co)

92w fine-tuning
Fine-tuning GPT-4o webinar (openai.com)

95w fine-tuning
LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? (huggingface.co)

100w fine-tuning
Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models (huggingface.co)

104w fine-tuning
Introducing improvements to the fine-tuning API and expanding our custom models program (openai.com)

116w fine-tuning
Fine-Tuning Gemma Models in Hugging Face (huggingface.co)

122w fine-tuning gemma
Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL (huggingface.co)

128w fine-tuning
Fine-tuning Llama 2 70B using PyTorch FSDP (huggingface.co)

145w fine-tuning llama
OpenAI partners with Scale to provide support for enterprises fine-tuning models (openai.com)

148w fine-tuning openai
GPT-3.5 Turbo fine-tuning and API updates (openai.com)

148w fine-tuning
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU (huggingface.co)

172w rlhf fine-tuning
Parameter-Efficient Fine-Tuning using 🤗 PEFT (huggingface.co)

176w fine-tuning
Fine-tuning GPT-3 to scale video creation (openai.com)

181w fine-tuning
Accelerating PyTorch distributed fine-tuning with Intel technologies (huggingface.co)

240w fine-tuning

← all threads