v1.5.0

@DagaBhai

Features

Even more training chat templates

Three more model families gain training-compatible templates with {% generation %} markers (so assistant_only_loss=True just works):

Phi-3.5 by @DagaBhai in #5746
Qwen3-VL by @aazizyan in #5764
Qwen3.5 Think / NoThink by @aazizyan in #5824

Final logits softcapping for async GRPO

The chunked LM-head path used by AsyncGRPOTrainer now supports models that use final_logit_softcapping (notably Gemma 2). _ChunkedLogProbFunction applies logit_scale, optional tanh-based softcapping, and temperature consistently in both forward and backward — softcapped models are no longer rejected.

by @mlarnouhet in #5691

KTO ↔ DPO alignment continues

Two more cycles closer to KTO graduation:

Align compute_loss flow by @albertvillanova in #5810
Align _compute_loss_liger flow by @albertvillanova in #5816

Trainer telemetry (opt-out)

_BaseTrainer.__init__ now emits a single anonymous huggingface_hub.send_telemetry ping per trainer instantiation, so we can finally see which trainers / model families / distributed backends are actually being used in practice and prioritize accordingly.

The payload is intentionally minimal — TRL version, trainer class name, model architecture, PEFT yes/no, distributed backend (deepspeed/fsdp/ddp/none), bucketed world size, device type, GPU model when available. No user data, no dataset names, no model paths, no hyperparameter values, never sent in CI / offline / HF_HUB_DISABLE_TELEMETRY mode.

See usage_stats.md for what's collected and how to opt out.

by @qgallouedec in #5758

Other

OpenRewardSpec: fix omitting task-scoped tools during rollout binding (fixes #5727) by @rycerzes in #5729
Add OpenReward example to the list of examples by @sergiopaniego in #5752
Add DDP-2 members to invariant test suite by @qgallouedec in #5736
Align and simplify the stable training scripts by @qgallouedec in #5812
Replace uv installation script with setup action by @qgallouedec in #5735

Fixes

Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing — GRPOTrainer was hanging indefinitely on truncated <tool_call> blocks (a degenerate case that happens naturally when generation hits max_completion_length mid-tool-call). Rewrote the regex to be non-backtracking — worst case goes from O(2ⁿ) to O(n). By @xodn348 in #5798
CUDA memory leak: release BNB dequantization buffers & stale state in OffloadActivations — follow-up to v1.4's activation-offloading leak fix. By @butterwecksolutions in #5730
Invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in #4693
Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in #5592
Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in #5799
Fix metric_for_best_model for trainer-specific eval metrics by @qgallouedec in #5811
Fix generate_batch: inference tensors blocking inplace ops in background thread by @albertvillanova in #5818
Replace deprecated torch_dtype with dtype across examples, docs, notebooks, tests, and experimental distillation / gold trainers by @qgallouedec in #5717

Documentation and Examples

docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in #5740

CI

Migrate tests to Qwen3.5 Think/NoThink fixtures + tiny-model generation scripts by @aazizyan in #5819 and #5821
Align tiny Glm4MoeForCausalLM / Cohere / Cohere2 / Qwen2.5-VL configs with their reference models by @qgallouedec in #5638, #5706, #5707 and #5739
Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip by @qgallouedec in #5779
Fix tiny Qwen2.5-VL fullatt_block_indexes out of range for depth=2 by @albertvillanova in #5805
Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in #5795
Fix vision config num_heads key in Qwen VL tiny model scripts by @matdou in #5792
Drop unjustified model.visual. skip in GRPO/RLOO Qwen2.5-VL tests by @qgallouedec in #5780
Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in #5778
Remove obsolete Gemma 3 vision-head guard from VLM training tests by @qgallouedec in #5772
Fix OOM in CI: reduce batch size in VLM SFT / GRPO/RLOO VLM / toolcall tests by @albertvillanova in #5687, #5767, #5801
Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in #5776
Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma 4 by @albertvillanova in #5760
Fix CI errors in response parsing for gpt-oss/llama with transformers v5 by @albertvillanova in #5755
Fix CI AttributeError: 'GptOssConfig' object has no attribute 'num_experts' by @albertvillanova in #5756
Fix CI apply_model_revisions by removing _commit_hash kwarg by @albertvillanova in #5762
Fix CI test to avoid skipping model.visual params by @albertvillanova in #5806
Fix transformers min version for tiny gemma 4 as 5.5.0 by @albertvillanova in #5763
Hotfix CI: pin torch < 2.12.0 (later reverted) by @albertvillanova in #5769
Fix catch-all empty string in Makefile pytest --only-rerun by @albertvillanova in #5784
chore: update tests_latest.yml by @hf-security-analysis[bot] in #5733

New Contributors

@hf-security-analysis[bot] made their first contribution in #5733
@Beichen-Ma made their first contribution in #5592
@DagaBhai made their first contribution in #5746
@xodn348 made their first contribution in #5740
@mlarnouhet made their first contribution in #5691
@matdou made their first contribution in #5792
@jamie-peterson-ml made their first contribution in #5799
@rycerzes made their first contribution in #5729

What's Changed

⬆️ Bump dev version by @qgallouedec in #5734
chore: update tests_latest.yml by @hf-security-analysis[bot] in #5733
fix: CUDA memory leak / release BNB dequantization buffers & stale state in OffloadActivations by @butterwecksolutions in #5730
fix: invalidate ZeRO-3 param coordinator trace in add_hooks by @roycho96 in #4693
Fix nested vocab_size for DistillationTrainer and GOLDTrainer by @Beichen-Ma in #5592
feat: add Phi-3.5 training chat templates with generation markers by @DagaBhai in #5746
docs(grpo): align model to Qwen2.5 and add GRPO OOM tab in quickstart by @xodn348 in #5740
torch_dtype -> dtype by @qgallouedec in #5717
Add OpenReward example to the list of examples by @sergiopaniego in #5752
Fix CI errors in response parsing for gptoss/llama with transformers v5 by @albertvillanova in #5755
Add DDP-2 members to invariant test suite by @qgallouedec in #5736
Hotfix CI param not updated AssertionError: Pin torch < 2.12.0 by @albertvillanova in #5769
Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config by @qgallouedec in #5638
Align tiny Cohere config with aya-expanse-8b by @qgallouedec in #5706
Align tiny Cohere2 config with tiny-aya-earth by @qgallouedec in #5707
Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma4 by @albertvillanova in #5760
Fix CI AttributeError: 'GptOssConfig' object has no attribute 'num_experts' by @albertvillanova in #5756
Fix CI apply_model_revisions by removing _commit_hash kwarg by @albertvillanova in #5762
Remove obsolete Gemma3 vision-head guard from VLM training tests by @qgallouedec in #5772
Replace uv installation script with setup action by @qgallouedec in #5735
Fix OOM in CI by clearing chained exception tracebacks by @albertvillanova in #5776
Fix transformers min version for tiny gemma4 as 5.5.0 by @albertvillanova in #5763
Final logits softcapping support for async GRPO Trainer by @mlarnouhet in #5691
Fix vision config num_heads key in Qwen VL tiny model scripts and revert torch pin by @matdou in #5792
Drop unjustified model.visual. skip in GRPO / RLOO Qwen2.5-VL tests by @qgallouedec in #5780
Fix OOM in CI by reducing batch size and sequence length for toolcall tests by @albertvillanova in #5801
Fix exponential backtracking in qwen3 / qwen3_5 / glm4moe response parsing by @xodn348 in #5798
Add telemetry to trainers by @qgallouedec in #5758
Add Qwen3-VL training chat template with generation markers by @aazizyan in #5764
Align tiny Qwen2.5-VL with Qwen/Qwen2.5-VL-3B-Instruct by @qgallouedec in #5739
Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip by @qgallouedec in #5779
Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests by @albertvillanova in #5767
Fix catch-all empty string in Makefile pytest --only-rerun by @albertvillanova in #5784
Remove non-existent params from tiny Qwen2-VL model by @albertvillanova in #5795
Fix tiny Qwen2.5-VL fullatt_block_indexes out of range for depth=2 by @albertvillanova in #5805
Make the LLaVA / LLaVA-Next test guard explicit by @qgallouedec in #5778
Fix MPS support in experimental empty_cache() by @jamie-peterson-ml in #5799
Fix CI test to avoid skipping model.visual params by @albertvillanova in #5806
Align KTO with DPO: Align compute_loss flow by @albertvillanova in #5810
Fix generate_batch: inference tensors block inplace ops in background thread by @albertvillanova in #5818
Fix metric_for_best_model for trainer-specific eval metrics by @qgallouedec in #5811
Align and simplify the stable training scripts by @qgallouedec in #5812
Align KTO with DPO: Align _compute_loss_liger flow by @albertvillanova in #5816
Add tiny Qwen3.5 Think/NoThink fixture generation scripts by @aazizyan in #5819
Migrate tests to Qwen3.5 Think/NoThink fixtures by @aazizyan in #5821
Fix OpenRewardSpec omitting task‑scoped tools during rollout binding (fixes #5727) by @rycerzes in #5729
Add Qwen3.5 Think/NoThink training chat templates with generation markers by @aazizyan in #5824
Release: v1.5 by @qgallouedec in #5835

Full Changelog: v1.4.0...v1.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly