Set _tokenizer as trainer attribute#5489
Conversation
| and isinstance(tokenizer, PreTrainedTokenizerBase) | ||
| and hasattr(tokenizer, "response_schema") # attribute not set by default for now | ||
| and tokenizer.response_schema is not None # only works if the tokenizer has a schema | ||
| and isinstance(self._tokenizer, PreTrainedTokenizerBase) |
There was a problem hiding this comment.
I think this condition is redundant now.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| and hasattr(self._tokenizer, "response_schema") # attribute not set by default for now | ||
| and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema |
There was a problem hiding this comment.
I think these 2 conditions can be combined into 1:
| and hasattr(self._tokenizer, "response_schema") # attribute not set by default for now | |
| and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema | |
| and getattr(self._tokenizer, "response_schema", None) # only works if the tokenizer has a schema: attribute not set by default for now |
There was a problem hiding this comment.
agree, although I feel it's less readable like this
There was a problem hiding this comment.
the current can be read smoothly: "the tokenizer has an attribute "response_schema" and this attribute response_schema is not None"
qgallouedec
left a comment
There was a problem hiding this comment.
I think it looks good! there are a few comments to address, mostly due to recent changes in grpo
| post_tool_completions = [ | ||
| parse_response(self.processing_class, ids) if ids else {} for ids in post_tool_ids | ||
| ] | ||
| post_tool_completions = [parse_response(self._tokenizer, ids) if ids else {} for ids in post_tool_ids] |
There was a problem hiding this comment.
parse_response now handles both processor and tokenizer, this should be reverted
There was a problem hiding this comment.
parse_response only needs a tokenizer instance but it had to handle both because we did not have a simple way to pass only tokenizer. Once we implement self._tokenizer in all trainers, parse_response could be simplified to accept only tokenizer instances.
There was a problem hiding this comment.
Note there are already other functions that only accept tokenizer instances: add_response_schema, is_chat_template_prefix_preserving, get_training_chat_template.
More broadly, the underlying goal of this PR is to centralize the processor/tokenizer disambiguation within processing_class in a single place, so that the rest of the code can rely on a well-defined and consistent interface, with a clear expected class instance.
In that sense, the current change in calling parse_response is an intermediate step toward that simplification, rather than a deviation from it.
There was a problem hiding this comment.
Sounds good, let's revert the VLM support in parse_response then.
For the record, a few things to keep in mind going forward. Accessing the inner tokenizer isn't always safe: for apply_chat_template, the processor does extra work (image token expansion), so processor.apply_chat_template is not just an alias for processor.tokenizer.apply_chat_template.
Applying that to each helper:
-
add_response_schema: tokenizer-only is fine. The processor doesn't expose a schema, and I'd expect schema handling to stay tokenizer-level even if upstream adds it — no chat-template-style side effects.
EDIT: because we check the chat template inadd_response_schema, we should actually pass the processor, even if we only setprocessor.tokenizer.response_schema -
parse_response: same story, parse_response only exists on the tokenizer today, so we have no choice. And even if upstream eventually moves it to the processor, I'm not too worried: parsing is not the same kind of operation as applying a chat template, so the image-token caveat doesn't really apply here. Revert VLM support inparse_response#5561 -
is_chat_template_prefix_preserving: this one is actually wrong as-is. It should support VLMs (for the reason above), and the call site change in this PR should and will be reverted:- if self.tools and not is_chat_template_prefix_preserving(self._tokenizer): + if self.tools and not is_chat_template_prefix_preserving(processing_class):
We'll extend the function in a follow-up. Support VLM processors in
is_chat_template_prefix_preserving#5558 -
get_training_chat_template: contrary to what the type hint suggests (needs updating), it already supports both tokenizer and processor, and should be called on the processor. My rule of thumb: with a processor, never manipulate the inner chat template directly. Accept processor inget_training_chat_template#5560
| and isinstance(tokenizer, PreTrainedTokenizerBase) | ||
| and hasattr(tokenizer, "response_schema") # attribute not set by default for now | ||
| and tokenizer.response_schema is not None # only works if the tokenizer has a schema | ||
| and isinstance(self._tokenizer, PreTrainedTokenizerBase) |
| and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema | ||
| ): | ||
| completions = [[parse_response(parsing_class, ids)] for ids in completion_ids] | ||
| completions = [[parse_response(self._tokenizer, ids)] for ids in completion_ids] |
There was a problem hiding this comment.
| has_response_schema = getattr(self._tokenizer, "response_schema", None) | ||
| if self.tools and not has_response_schema: | ||
| processing_class = add_response_schema(processing_class) | ||
| self._tokenizer = add_response_schema(self._tokenizer) |
There was a problem hiding this comment.
we add the response schema by checking the chat template in add_response_schema. But processor.chat_template and processor.tokenizer.chat_template are independent attributes, and processor.apply_chat_template reads only the former, so they can silently diverge if either is mutated. In practice they're populated from the same source files on load and match today for all tested VLMs, but technically, patching one doesn't patch the other, which is why I think we should keep
| self._tokenizer = add_response_schema(self._tokenizer) | |
| processing_class = add_response_schema(processing_class) |
and extend add_response_schema to processor: #5520
There was a problem hiding this comment.
OK, I understood you said it was OK to pass just tokenizer: #5489 (comment)
add_response_schema: tokenizer-only is fine. The processor doesn't expose a schema, and I'd expect schema handling to stay tokenizer-level even if upstream adds it — no chat-template-style side effects.
There was a problem hiding this comment.
I read your PR and I'm not sure of fully understanding it: isn't it overlapping with this PR?
There was a problem hiding this comment.
Ah well actually I was not correct: add_response_schema should actually take the processor, not because of the schema, but because of the template: there is no guaranty that processor.chat_template == processor.tokenizer.chat_template. In other words
# this would be correct in the vast majority of cases
def add_response_schema(tokenizer):
tokenizer.response_schema = SCHEMAS[tokenizer.chat_template]
# but this is better
def add_response_schema(processor):
processor.tokenizer.response_schema = SCHEMAS[processor.chat_template]There was a problem hiding this comment.
After having thought longer about this, I'm not sure about the logic in transformers' implementation of this...
- Shouldn't they enforce that both (processor and tokenizer) chat_template's are the same?
- Does it make sense that a tokenizer.response_schema is aligned with its processor.chat_template, but not with its own tokenizer.chat_template?
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5cb4c39. Configure here.

Set
_tokenizerattribute.This is a draft PR. If validated, I will implement the same pattern in other trainers in this same PR.
CC: @qgallouedec
This PR refactors the
GRPOTrainerto consistently use theself._tokenizerattribute throughout the class, replacing previous direct references totokenizer,processing_class, or their attributes. This change improves code clarity, reduces ambiguity, and ensures that all tokenizer-related operations are performed on the correct instance, especially in complex scenarios involving processors and vision-language models (VLMs).Changes
Tokenizer Attribute Refactoring:
tokenizer,processing_class.tokenizer, orprocessing_classfor tokenizer operations are now replaced withself._tokenizer. This includes setting pad/eos tokens, accessing token IDs, and calling tokenizer methods.Chat Template and Response Schema Handling:
self._tokenizerinstead ofprocessing_classor its variants. This ensures correct behavior when tools are enabled and for multi-turn training.Vision-Language Model (VLM) Support:
self._tokenizer, ensuring compatibility and correctness across different model families.These changes make the codebase more robust and maintainable by centralizing tokenizer logic and reducing potential bugs from inconsistent attribute usage.
Note
Medium Risk
Mostly a mechanical refactor, but it touches token ID/padding/EOS detection and tool-call parsing paths used during generation and masking, which could subtly change training behavior across several trainers.
Overview
Refactors several trainers (
GRPOTrainer,RLOOTrainer,DPOTrainer, and experimentalDPPO/GFPO/GRPOWithReplayBuffer) to store the tokenizer asself._tokenizerand consistently use it forpad_token_id/eos_token_id, padding, truncation detection, andparse_response(tool-call/structured output decoding).Updates initialization logic to ensure
pad_tokenis set on the underlying tokenizer and adjusts tests to mock_tokenizerinstead of per-trainerpad_token_id/eos_token_idattributes.Reviewed by Cursor Bugbot for commit 59b217e. Bugbot is set up for automated code reviews on this repo. Configure here.