Skip to content

Fix nested vocab_size for DistillationTrainer and GOLDTrainer#5592

Merged
cmpatino merged 5 commits into
huggingface:mainfrom
Beichen-Ma:fix-nested-vocab-size
May 11, 2026
Merged

Fix nested vocab_size for DistillationTrainer and GOLDTrainer#5592
cmpatino merged 5 commits into
huggingface:mainfrom
Beichen-Ma:fix-nested-vocab-size

Conversation

@Beichen-Ma
Copy link
Copy Markdown
Contributor

@Beichen-Ma Beichen-Ma commented Apr 19, 2026

Fixes #5585. The root cause was in DistillationTrainer.__init__ calls teacher_model.resize_token_embeddings(self.model.config.vocab_size), which raises AttributeError on configs where vocab_size is nested (e.g. Qwen3_5Config exposes it under config.text_config).

Fix

  • Replaced with self.model.config.get_text_config().vocab_size, which transformers defines on PretrainedConfig to return the text sub-config on nested configs and self on flat ones.
  • Applied the same fix to GOLDTrainer (identical duplicated pattern). Updated an existing GOLD unit test whose SimpleNamespace mock didn't implement get_text_config.

Tests

  • Reproduced the original error on Qwen3.5-0.8B student / Qwen3.5-2B teacher before the fix.
  • After the fix, trainer init and one training step complete cleanly.
  • pytest tests/experimental/test_gold_trainer.py passes.
  • DistillationTrainer has no existing unit test file. Given it's experimental and the fix is one line mirroring an existing pattern, I scoped this PR to the minimal fix rather than introducing a new test file.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.


Note

Medium Risk
Touches trainer initialization for DistillationTrainer and GOLDTrainer; a wrong vocab_size could mis-size teacher embeddings or break startup for some model configs, but the change is small and localized.

Overview
Fixes teacher embedding resizing for models whose vocab_size lives in a nested text config by switching from self.model.config.vocab_size to self.model.config.get_text_config().vocab_size in both DistillationTrainer and GOLDTrainer.

Updates the GOLD unit test mock config to implement get_text_config() so the init path continues to validate the resize call.

Reviewed by Cursor Bugbot for commit 99bf555. Bugbot is set up for automated code reviews on this repo. Configure here.

@cmpatino
Copy link
Copy Markdown
Collaborator

The changes look good to me! Let's wait for the review from another maintainer before merging

@Beichen-Ma
Copy link
Copy Markdown
Contributor Author

Hey @cmpatino, just checking in on this — looks like it has the approvals and wanted to make sure it didn't slip through the cracks. Thanks!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@cmpatino
Copy link
Copy Markdown
Collaborator

Thank you for the ping @Beichen-Ma! Looks good to me, so I'm merging it.

@cmpatino cmpatino merged commit 9ff8c78 into huggingface:main May 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DistillationTrainer fails with Qwen3.5 due to nested config.vocab_size attribute

4 participants