Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update
So, I trained two variants of this task: using just length penalty using a quality reward and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools. Those are: Consciencess Coverage Clarity Fa…