Skip to content

v1.5.0

Choose a tag to compare

@qgallouedec qgallouedec released this 25 May 15:33
· 56 commits to main since this release
bd1e73f

Features

Even more training chat templates

Three more model families gain training-compatible templates with {% generation %} markers (so assistant_only_loss=True just works):

Final logits softcapping for async GRPO

The chunked LM-head path used by AsyncGRPOTrainer now supports models that use final_logit_softcapping (notably Gemma 2). _ChunkedLogProbFunction applies logit_scale, optional tanh-based softcapping, and temperature consistently in both forward and backward — softcapped models are no longer rejected.

by @mlarnouhet in #5691

KTO ↔ DPO alignment continues

Two more cycles closer to KTO graduation:

Trainer telemetry (opt-out)

_BaseTrainer.__init__ now emits a single anonymous huggingface_hub.send_telemetry ping per trainer instantiation, so we can finally see which trainers / model families / distributed backends are actually being used in practice and prioritize accordingly.

The payload is intentionally minimal — TRL version, trainer class name, model architecture, PEFT yes/no, distributed backend (deepspeed/fsdp/ddp/none), bucketed world size, device type, GPU model when available. No user data, no dataset names, no model paths, no hyperparameter values, never sent in CI / offline / HF_HUB_DISABLE_TELEMETRY mode.

See usage_stats.md for what's collected and how to opt out.

by @qgallouedec in #5758

Other

Fixes

  • Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsingGRPOTrainer was hanging indefinitely on truncated <tool_call> blocks (a degenerate case that happens naturally when generation hits max_completion_length mid-tool-call). Rewrote the regex to be non-backtracking — worst case goes from O(2ⁿ) to O(n). By @xodn348 in #5798
  • CUDA memory leak: release BNB dequantization buffers & stale state in OffloadActivations — follow-up to v1.4's activation-offloading leak fix. By @butterwecksolutions in #5730
  • Invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in #4693
  • Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in #5592
  • Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in #5799
  • Fix metric_for_best_model for trainer-specific eval metrics by @qgallouedec in #5811
  • Fix generate_batch: inference tensors blocking inplace ops in background thread by @albertvillanova in #5818
  • Replace deprecated torch_dtype with dtype across examples, docs, notebooks, tests, and experimental distillation / gold trainers by @qgallouedec in #5717

Documentation and Examples

  • docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in #5740

CI

New Contributors

What's Changed

Full Changelog: v1.4.0...v1.5.0