event

Fine Tuning

123 items · started 2019-09-19 · ongoing (last activity 2026-06-09)

  1. Financial transaction processing requires extracting structured merchant information from noisy, abbreviated bank transaction strings at scale. Our current production system, a LoRA-fine-tuned LLaMA 3.1-8B, achieves 96.95% F1 on this task,…

  2. Hi HN! This is Arseniy from Superlog (YC P26).

  3. Global academic literature at your fingertips. Reliable Google Scholar alternative for large-scale access to academic PDFs and metadata, with full-text search and bulk download.

  4. Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful.…

  5. Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathe…

  6. While Large Language Models (LLMs) excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning (SFT) and Reinfo…

  7. The rapid evolution of Large Language Models (LLMs) has established cross-lingual versatility as a defining feature of modern systems. However, fine-tuning these models frequently induces negative interference across languages.

  8. T2I models cannot effectively capture sentiment from various types of text, including diaries, as they primarily focus on visual object-related patterns rather than contextual emotional understanding. This paper proposes an emotion-aware t…

  9. In my predictions for 2030 I wrote that tech writers would be using specialized LLMs, running locally on powerful hardware. I see hints of this move to “local first” among engineering pundits, but we’re not there yet, in part because of ho…

  10. May 29, 2026 It’s been quite some time since major LLM providers introduced the behaviour that the chatbots often end their response with a question. The motivation is clear: more engagement, more data to train on.

  11. For the last month i've been trying to fine-tune jina-v5 (which has performed best on my corpus out of the box) on slovak law chunks, time and time again no matter what i do I can't get the model to learn nuance of slovak syntax. here's th…

  12. Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from…

  13. Hey HN -- I'm Shalin, one of the cofounders of Hyper. My cofounder Kanyes and I have been power users of a lot of second brain type software like Notion and Obsidian for years, and tried fine-tuning GPT-2 back in 2020 to solve this exact p…

  14. A skill is external state for an agent. Instead of fine-tuning a model or hand-maintaining prompts, SkillOpt runs the frozen agent on scored batches, asks a separate optimizer model to propose structured edits, and accepts a candidate only…

  15. I've fine-tuned Qwen 3.5 0.8B on the dataset provided by Pangram with their EditLens paper. It's available via a Chrome extension; you can just click selected text and it's going to give you the probability distribution of how likely it is…

  16. My use-cases will be to test open-weight LLMs and work on harnesses, inference systems and possibly other non-ML workflows (CS-related) in the future. Fine-tuning would not be something I do locally because I can rent a B200 from RunPod fo…

  17. I think I had GPT-5.5 leak its trace during a normal conversation, and it really reads like the caveman mode fad from a few months back. Maybe we can achieve better token efficiency by taking some high-quality thinking trace from an open m…

  18. I.Introduction Transformer-based PLMs [1],[2],[3],[4],[5] have demonstrated remarkable performance across a wide range of NLP tasks. To fully harness the potential of PLMs, fine-tuning is commonly employed to adapt them to task-specific da…

  19. I've kept a personal diary since 2019. Last week I fed 200+ entries to an LLM and asked it how I've changed over 7 years.

  20. If you look at the cosine sim between the embeddings of "a 500 hp car", "a 1,200 hp car" and "a 73 hp car", you'll soon see that embedding models have no sense of number ordering at all. (I tested Qwen and ModernBERT-based embeddings) It m…

  21. Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation Motivation NVIDIA Cosmos Predict 2.5 is a large-scale world model capable of generating physically plausible videos conditioned on text, images, or video clips…

  22. RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong AI models don't know your private data.

  23. The Ultimate LLM Fine-Tuning Guide From dataset to GGUF - every parameter explained, every step runnable Fine-tuning is a direct intervention into how a language model behaves. Not prompting, not system instructions, not RAG - actual weigh…

  24. Qwopus3.5-9B-coder is specially optimized and fine-tuned for high-performance 🤖 Agentic Coding, complex Tool Calling, and logical reasoning. 💡 Why the 9B Dense Model?

  25. I proposed two architectures for enabling LLMs to learn daily from personal interactions: Internal KV-Sphere Architecture (IKSA) Background Micro Fine-Tuning (BMFT) Both work with zero GPU and zero catastrophic forgetting. Full paper: in c…

  26. Liquid Harness is an autonomous agent by Liquid AI that takes a plain-English spec and ships a fine-tuned Liquid Foundation Model. Spec, data, eval, training, deployment — all in one run.

  27. I was MLOps lead at an AI company managing 5000+ GPUs across GCP and CoreWeave. Left to start my own thing and now I'm back to renting GPUs like everyone else.

  28. Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data.

  29. Open source repo: https://github.com/grctest/finetuned-gemmatranslate-cy 5% of the fine-tuning took 40 minutes and cost a couple dollars to prove the process works. Looking forwards to Flash Attention v4 to leave beta, to test fine-tuning…

  30. Hello there people. So I have noticed that people are pretty much ignoring Llama 3 plus 3.1, 3.2, and 3.3 these days.

  31. GenZ LLM A post-trained language model that responds in GenZ slang, built on top of Qwen2.5-0.5B-Instruct using Supervised Fine-Tuning (SFT) followed by Reinforcement Learning with GRPO. The fine-tuned model is available on Hugging Face: a…

  32. After the first general general fine-tuning tutorial i posted here (https://www.promptinjection.net/p/the-ultimate-llm-ai-fine-tuning-guide-tutorial) some people asked if i can't make the same for AMD Strix Halo because approach here is qu…

  33. Hey everyone I’m Jaguar, building Jungle Grid. We just open-sourced our MCP server for agentic GPU workload execution.

  34. Most "AI progress" talk lives on one layer: models. Bigger model, smaller model, new benchmark, repeat.

  35. I want to move past the "democratization" slogans. What is the most practical contribution consumer-grade hardware can make to the ecosystem right now?

  36. ​ Fine-tuning UI with AI right now: "Make the shadow softer." "Stronger." "No, less." "Go back." "A bit more." 17 messages later, you've spent more tokens than the shadow is soft. I built something that breaks the loop.

  37. MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required The Idea Medical question answering is one of those tasks where the stakes are genuinely high. A model that confidently picks the wrong answer on a clinical MCQ isn't just wro…

  38. Got an email today about the announcement. > OpenAI is winding down the fine-tuning API and platform.

  39. Fine-tuning is one of today's most computationally intensive workloads, and it continues to push hardware to its limits. NVIDIA GPUs are purpose-built for these workloads: they break complex problems into pieces and process them in paralle…

  40. We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussing their Model Spec. This shapes how models generalize from subsequent alignment training.

  41. I'm part of the team building Transformer Lab, an open source ML research platform. We put together a short demo of how to run text to speech training, which you can do on your own hardware using a Local provider.

  42. Anthropic's alignment team published a paper this week called Model Spec Midtraining (MSM) and I think it's one of the more practically interesting alignment results I've seen in a while. The core problem they're solving: Current alignment…

  43. Two weeks before a Fortune 500 product launch, we told a client to scrap their fine-tuned model and rebuild with RAG instead. They lost eight weeks and $180K.

  44. No, I won't tell you how. No this is not for anyone who is not already a proven contributor to the fine-tuning space.

  45. Import AI 455: AI systems are about to start building themselves. Jack Clark thinks there’s a ~30% chance by the end of 2027 and a ~60%+ chance by the end of 2028 that AI research becomes automated, with models eventually helping train the…

  46. I've been using LLMs to help write my thesis, but the output feels dry and uses awkward phrasing (especially in translation). I'm looking to fine-tune an accessible LLM to better match natural academic writing in my language.

  47. https://huggingface.co/datasets/angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k A synthetic fine-tuning dataset created from Claude 4.6/4.7. 8,706 total examples all with reasoning.

  48. I have (free) access to a SLURM cluster with 8x NVIDIA A100 80GB GPUs (=640 GB VRAM) on a single task, and I want to run an open-weight model locally with llama.cpp for data generation, not coding. My use case is generating teacher data fo…

  49. We tested prompting, fine-tuning, RL, and grounded evaluation across ~1,500 labeled flashcards—and found models catch obvious misses but not plausible failures.

  50. Hey Everyone, Over the last few months, I noticed a massive gap in how we learn about Agentic AI. There are a million theoretical blog posts and dense whitepapers on RAG, tool calling, and swarms, but almost nowhere to just sit down, run a…

  51. Hi HN, I'm Danilo. I've been struggling with the limitations of AdamW when fine-tuning LLMs locally.

  52. I put together a hands-on tutorial that takes you from problem framing to fine-tuning, step by step. I decided to build a wildfire prevention system that uses satellite images and a Small Vision-Language Model (LFM2.5-VL-450M) to extract r…

  53. We find that a small amount of fine-tuning on instruction following in the CoT generalizes to meaningful increases in CoT controllability on an out-of-distribution set of tasks. We fine-tune four reasoning models on small datasets of instr…

  54. Spent a week doing LoRA fine-tuning on Gemma 4 E2B (~5.1B total params, ~2B active in text decoder) for a narrow Python code-generation task. Bad outputs went from ~5% to 0% (greedy) and 1.5% (sampled) across 134 tests.

  55. Spent a week doing LoRA fine-tuning on Gemma 4 E2B (gemma-4-e2b-it, ~5.1B total params, ~2B active in the text decoder) for a narrow Python code-generation task. Setup: Model: Gemma 4 E2B, bf16, language_model only (vision + audio towers f…

  56. Unlike LoRA and its variants, which inject trainable parameters directly into the weights of the Transformer, requiring tight coupling with the backbone. ShadowPEFT instead enhances the frozen large base model by adding a lightweight, cent…

  57. We want to set up the following: A Local LLM environment for AI development, used by multiple software developers Infrastructure for training Vision AI models Capabilities for AI model fine-tuning I’m currently struggling to decide between…

  58. A 15B-parameter token-mixer supernet with 8 optimized deployment presets spanning 1.0× to 10.7× decode throughput at 32K sequence length, all from a single checkpoint. Derived from Apriel-1.6 through stochastic distillation and targeted su…

  59. I've built software for clients for 38 years and kept hitting the same wall: weeks spent scaffolding the data layer and the Clean Architecture around it before any real work begins. I asked Claude to estimate how long it would take to gene…

  60. +30% avg accuracy lift on classification & extraction tasks vs. base Gemma ~7 days until your first auto-improvement run lands in production 0 lines of fine-tuning code you have to write, ever $0/retrain starting price.

  61. Memory-augmented Large Language Models (LLMs) are essential for developing capable, long-term AI agents. Recently, applying Reinforcement Learning (RL) to optimize memory operations, such as extraction, updating, and retrieval, has emerged…

  62. Hey, I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app…

  63. Hey everyone, We just open-sourced our reasoning model, Chaperone-Thinking-LQ-1.0, on Hugging Face. It's built on DeepSeek-R1-Distill-Qwen-32B but goes well beyond a simple quantization — here's what we actually did: The pipeline: 4-bit GP…

  64. I've been building an open Q&A dataset for the Swedish construction industry (byggbransch) over the last few weeks — something that's been a gap in Swedish-language domain-specific datasets. Finally hit a milestone worth sharing.

  65. Writing a fine-tuning and deployment pipeline isn't as easy as it looks (Gemma 4 Version) Fine-tune and deploy Gemma 4 on Oxen.ai Google's Gemma 4 dropped in April 2026 with multimodal support (text, image, video, audio), a novel hybrid KV…

  66. Summary of Findings This issue documents what we learned making Gemma4 26B-A4B-it train on consumer hardware (RTX 4090, 24GB VRAM). No A100.

  67. I have significantly distilled my AI Agents and Skills definitions. My goal is to reduce the context size and token usage without impacting the quality of my development team.

  68. Hey HN! I'm a solo dev and I just wanted to share my latest Android game — Rollquation.

  69. Hey r/LocalLLaMA, I've been running a small 4-node DGX Spark cluster on a 400µT fabric switch and got frustrated with the usual raw Ray/vLLM scripts and EXO basically ignoring pure NVIDIA paths. I started from the solid foundation in [eugr…

  70. We’ve released a 100,000-sample Chain-of-Thought (CoT) dataset for fine-tuning local reasoning models. Each sample includes explicit intermediate reasoning traces, rather than answer-only supervision.

  71. Friday — A 24/7 AI Assistant Built Entirely on Claude Code An always-on personal AI system using only Claude Code CLI ($100/month) and Telegram — no custom AI, no cloud VMs, no fine-tuning. Live page: missingus3r.github.io/friday-showcase…

  72. A guide to model quantization in fine-tuning (and how to pick the right GGUF) About this post Fine-tuning with Unsloth and Axolotl is, on the whole, a well thought-out experience where a lot of the complexity is handled for you. However on…

  73. TL;DR: Fine-tuned Chatterbox-Multilingual for Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi using LoRA adapters + tokenizer extension. Only 7.8M / 544M parameters trained.

  74. 🌟 Gemopus-4-26B-A4B-it [!NOTE] Gemopus is an attempt at fine-tuning Gemma 4 with a core philosophy of "stability first". While preserving the original reasoning order of Gemma 4 as much as possible, we conducted targeted refinements for an…

  75. Your brand voice is probably a PDF nobody reads, or it's trapped in one founder's head, or it's scattered across a thousand ChatGPT histories. I wanted to treat it like code instead — a file you can version, share, diff, and plug into any…

  76. Was fine-tuning a Japanese ASR model (based on Qwen3-ASR) to handle technical terminology better. The model clearly improved — "Next.js" comes out as "Next.js" instead of "ネクストジェイズ", punctuation works, etc.

  77. Trained a 125M LM from scratch (custom tokenizer) + released instruct checkpoint and SFT framework so others can fine-tune their own variants I’ve been experimenting with training small language models fully from scratch (no GPT-2 init, no…

← all threads