Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update

reddit-localllama · www.reddit.com ·1 pts·10 replies ↗ ·1d

So, I trained two variants of this task: using just length penalty using a quality reward and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools. Those are: Consciencess Coverage Clarity Fa…

open →