Set _tokenizer as trainer attribute by albertvillanova · Pull Request #5489 · huggingface/trl

albertvillanova · 2026-04-09T17:52:10Z

Set _tokenizer attribute.

This is a draft PR. If validated, I will implement the same pattern in other trainers in this same PR.

This PR refactors the GRPOTrainer to consistently use the self._tokenizer attribute throughout the class, replacing previous direct references to tokenizer, processing_class, or their attributes. This change improves code clarity, reduces ambiguity, and ensures that all tokenizer-related operations are performed on the correct instance, especially in complex scenarios involving processors and vision-language models (VLMs).

Changes

Tokenizer Attribute Refactoring:

All references to tokenizer, processing_class.tokenizer, or processing_class for tokenizer operations are now replaced with self._tokenizer. This includes setting pad/eos tokens, accessing token IDs, and calling tokenizer methods.

Chat Template and Response Schema Handling:

Logic for checking and adding response schemas, as well as selecting chat templates, now consistently uses self._tokenizer instead of processing_class or its variants. This ensures correct behavior when tools are enabled and for multi-turn training.

Vision-Language Model (VLM) Support:

Token ID lookups for vision and video tokens in VLMs now use self._tokenizer, ensuring compatibility and correctness across different model families.

These changes make the codebase more robust and maintainable by centralizing tokenizer logic and reducing potential bugs from inconsistent attribute usage.

Note

Medium Risk
Mostly a mechanical refactor, but it touches token ID/padding/EOS detection and tool-call parsing paths used during generation and masking, which could subtly change training behavior across several trainers.

Overview
Refactors several trainers (GRPOTrainer, RLOOTrainer, DPOTrainer, and experimental DPPO/GFPO/GRPOWithReplayBuffer) to store the tokenizer as self._tokenizer and consistently use it for pad_token_id/eos_token_id, padding, truncation detection, and parse_response (tool-call/structured output decoding).

Updates initialization logic to ensure pad_token is set on the underlying tokenizer and adjusts tests to mock _tokenizer instead of per-trainer pad_token_id/eos_token_id attributes.

^{Reviewed by Cursor Bugbot for commit 59b217e. Bugbot is set up for automated code reviews on this repo. Configure here.}

albertvillanova · 2026-04-09T17:54:23Z

-                and isinstance(tokenizer, PreTrainedTokenizerBase)
-                and hasattr(tokenizer, "response_schema")  # attribute not set by default for now
-                and tokenizer.response_schema is not None  # only works if the tokenizer has a schema
+                and isinstance(self._tokenizer, PreTrainedTokenizerBase)


I think this condition is redundant now.

HuggingFaceDocBuilderDev · 2026-04-09T17:54:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova · 2026-04-09T17:58:38Z

+                and hasattr(self._tokenizer, "response_schema")  # attribute not set by default for now
+                and self._tokenizer.response_schema is not None  # only works if the tokenizer has a schema


I think these 2 conditions can be combined into 1:

Suggested change

and hasattr(self._tokenizer, "response_schema") # attribute not set by default for now

and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema

and getattr(self._tokenizer, "response_schema", None) # only works if the tokenizer has a schema: attribute not set by default for now

agree, although I feel it's less readable like this

the current can be read smoothly: "the tokenizer has an attribute "response_schema" and this attribute response_schema is not None"

qgallouedec

I think it looks good! there are a few comments to address, mostly due to recent changes in grpo

qgallouedec · 2026-04-14T16:03:22Z

-            post_tool_completions = [
-                parse_response(self.processing_class, ids) if ids else {} for ids in post_tool_ids
-            ]
+            post_tool_completions = [parse_response(self._tokenizer, ids) if ids else {} for ids in post_tool_ids]


parse_response now handles both processor and tokenizer, this should be reverted

parse_response only needs a tokenizer instance but it had to handle both because we did not have a simple way to pass only tokenizer. Once we implement self._tokenizer in all trainers, parse_response could be simplified to accept only tokenizer instances.

Note there are already other functions that only accept tokenizer instances: add_response_schema, is_chat_template_prefix_preserving, get_training_chat_template.

More broadly, the underlying goal of this PR is to centralize the processor/tokenizer disambiguation within processing_class in a single place, so that the rest of the code can rely on a well-defined and consistent interface, with a clear expected class instance.

In that sense, the current change in calling parse_response is an intermediate step toward that simplification, rather than a deviation from it.

Sounds good, let's revert the VLM support in parse_response then.

For the record, a few things to keep in mind going forward. Accessing the inner tokenizer isn't always safe: for apply_chat_template, the processor does extra work (image token expansion), so processor.apply_chat_template is not just an alias for processor.tokenizer.apply_chat_template.
Applying that to each helper:

add_response_schema: tokenizer-only is fine. The processor doesn't expose a schema, and I'd expect schema handling to stay tokenizer-level even if upstream adds it — no chat-template-style side effects.
EDIT: because we check the chat template in add_response_schema, we should actually pass the processor, even if we only set processor.tokenizer.response_schema

parse_response: same story, parse_response only exists on the tokenizer today, so we have no choice. And even if upstream eventually moves it to the processor, I'm not too worried: parsing is not the same kind of operation as applying a chat template, so the image-token caveat doesn't really apply here. Revert VLM support in parse_response #5561

is_chat_template_prefix_preserving: this one is actually wrong as-is. It should support VLMs (for the reason above), and the call site change in this PR should and will be reverted:

- if self.tools and not is_chat_template_prefix_preserving(self._tokenizer): + if self.tools and not is_chat_template_prefix_preserving(processing_class):

We'll extend the function in a follow-up. Support VLM processors in is_chat_template_prefix_preserving #5558

get_training_chat_template: contrary to what the type hint suggests (needs updating), it already supports both tokenizer and processor, and should be called on the processor. My rule of thumb: with a processor, never manipulate the inner chat template directly. Accept processor in get_training_chat_template #5560

qgallouedec · 2026-04-14T16:04:45Z

-                and isinstance(tokenizer, PreTrainedTokenizerBase)
-                and hasattr(tokenizer, "response_schema")  # attribute not set by default for now
-                and tokenizer.response_schema is not None  # only works if the tokenizer has a schema
+                and isinstance(self._tokenizer, PreTrainedTokenizerBase)


qgallouedec · 2026-04-14T16:05:09Z

+                and self._tokenizer.response_schema is not None  # only works if the tokenizer has a schema
            ):
-                completions = [[parse_response(parsing_class, ids)] for ids in completion_ids]
+                completions = [[parse_response(self._tokenizer, ids)] for ids in completion_ids]


same as https://github.com/huggingface/trl/pull/5489/changes#r3080828208

…bute

qgallouedec · 2026-04-15T18:36:41Z

+        has_response_schema = getattr(self._tokenizer, "response_schema", None)
        if self.tools and not has_response_schema:
-            processing_class = add_response_schema(processing_class)
+            self._tokenizer = add_response_schema(self._tokenizer)


we add the response schema by checking the chat template in add_response_schema. But processor.chat_template and processor.tokenizer.chat_template are independent attributes, and processor.apply_chat_template reads only the former, so they can silently diverge if either is mutated. In practice they're populated from the same source files on load and match today for all tested VLMs, but technically, patching one doesn't patch the other, which is why I think we should keep

Suggested change

self._tokenizer = add_response_schema(self._tokenizer)

processing_class = add_response_schema(processing_class)

and extend add_response_schema to processor: #5520

OK, I understood you said it was OK to pass just tokenizer: #5489 (comment)

add_response_schema: tokenizer-only is fine. The processor doesn't expose a schema, and I'd expect schema handling to stay tokenizer-level even if upstream adds it — no chat-template-style side effects.

I read your PR and I'm not sure of fully understanding it: isn't it overlapping with this PR?

Use self._tokenizer in DPPO

Ah well actually I was not correct: add_response_schema should actually take the processor, not because of the schema, but because of the template: there is no guaranty that processor.chat_template == processor.tokenizer.chat_template. In other words

# this would be correct in the vast majority of cases def add_response_schema(tokenizer): tokenizer.response_schema = SCHEMAS[tokenizer.chat_template] # but this is better def add_response_schema(processor): processor.tokenizer.response_schema = SCHEMAS[processor.chat_template]

After having thought longer about this, I'm not sure about the logic in transformers' implementation of this...

Shouldn't they enforce that both (processor and tokenizer) chat_template's are the same?

Does it make sense that a tokenizer.response_schema is aligned with its processor.chat_template, but not with its own tokenizer.chat_template?

…bute

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5cb4c39. Configure here.}

qgallouedec

lgtm!

albertvillanova added 4 commits April 9, 2026 19:10

Set tokenizer attribute in GRPO

db61d3e

Replace self.pad_token_id and self.eos_token_id

fea92ab

Use self._tokenizer

a6ad78d

Simplify self.processing_class with self._tokenizer

d479816

albertvillanova changed the title ~~Set _tokenizer attribute~~ Set _tokenizer as trainer attribute Apr 9, 2026

albertvillanova commented Apr 9, 2026

View reviewed changes

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

Comment thread trl/trainer/grpo_trainer.py

albertvillanova commented Apr 9, 2026

View reviewed changes

Fix assignment

e5b5031

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

Update tests

319571f

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread tests/test_grpo_trainer.py Outdated

qgallouedec reviewed Apr 14, 2026

View reviewed changes

albertvillanova added 10 commits April 15, 2026 09:41

Merge remote-tracking branch 'upstream/main' into set-tokenizer-attri…

a898089

…bute

Remove redundant condition

45f50d8

Remove dead code from tests

854ef8d

Set tokenizer attribute in DPO

cd10af5

Use self._tokenizer

f5ed9f2

Set tokenizer attribute in RLOO

6a072e5

Replace self.pad_token_id and self.eos_token_id

b4db781

Set tokenizer attribute in SFT

8a0ac8b

Use self._tokenizer

62508a2

Merge remote-tracking branch 'upstream/main' into set-tokenizer-attri…

8f665e4

…bute

qgallouedec reviewed Apr 15, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

albertvillanova added 4 commits April 15, 2026 19:36

Use self._tokenizer in DPPO

d01f99d

Make parse_response accept only tokenizer

2242686

Revert

55ab670

Merge remote-tracking branch 'upstream/main' into set-tokenizer-attri…

67c4ace

…bute

cursor Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

qgallouedec reviewed Apr 15, 2026

View reviewed changes

albertvillanova mentioned this pull request Apr 16, 2026

Revert VLM support in parse_response #5561

Merged

albertvillanova added 4 commits April 16, 2026 09:34

Replace self.pad_token_id and self.eos_token_id in DPPO

db1b067

Replace self.pad_token_id and self.eos_token_id in GFPO

aaed316

Replace self.pad_token_id and self.eos_token_id in GRPOWithReplayBuffer

9f9347a

Merge remote-tracking branch 'upstream/main' into set-tokenizer-attri…

e6c73d4

…bute

This was referenced Apr 16, 2026

Set _tokenizer attribute in experimental trainers #5566

Merged

Support processor in maybe_apply_chat_template #5567

Merged

Merge remote-tracking branch 'upstream/main' into set-tokenizer-attri…

5cb4c39

…bute

cursor Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py Outdated

Pass processing_class to add_response_schema

59b217e

albertvillanova mentioned this pull request Apr 17, 2026

[docs] Add chat templates page to web docs #5581

Merged

8 tasks

qgallouedec approved these changes Apr 20, 2026

View reviewed changes

albertvillanova merged commit 6e1705a into huggingface:main Apr 20, 2026
18 of 24 checks passed

		and hasattr(self._tokenizer, "response_schema") # attribute not set by default for now
		and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema

	and hasattr(self._tokenizer, "response_schema") # attribute not set by default for now
	and self._tokenizer.response_schema is not None # only works if the tokenizer has a schema
	and getattr(self._tokenizer, "response_schema", None) # only works if the tokenizer has a schema: attribute not set by default for now

	self._tokenizer = add_response_schema(self._tokenizer)
	processing_class = add_response_schema(processing_class)

Conversation

albertvillanova commented Apr 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qgallouedec Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Apr 9, 2026 •

edited by cursor Bot

Loading

qgallouedec Apr 15, 2026 •

edited

Loading

albertvillanova Apr 16, 2026 •

edited

Loading

albertvillanova Apr 17, 2026 •

edited

Loading