Skip to content

Align tiny Cohere2 config with tiny-aya-earth#5707

Merged
qgallouedec merged 6 commits into
mainfrom
align-cohere2
May 15, 2026
Merged

Align tiny Cohere2 config with tiny-aya-earth#5707
qgallouedec merged 6 commits into
mainfrom
align-cohere2

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented May 5, 2026

Reduce the config diff between tiny-Cohere2ForCausalLM and the reference CohereLabs/tiny-aya-earth by mirroring non-size config fields:

  • vocab_size=262144 (was len(tokenizer.vocab)=261008)
  • logit_scale=1.0, rope_theta=50000, bos_token_id=2, eos_token_id=3
  • 10 legacy kwargs stored in the ref config (cache_implementation, layer_switch, order_of_interleaved_layers, position_embedding_type, rotary_pct, use_embedding_sharing, use_gated_activation, use_parallel_block, use_parallel_embedding, use_qk_norm)

Remaining diffs are intentional size reductions (head_dim, hidden_size, intermediate_size, num_attention_heads, num_hidden_layers, num_key_value_heads) plus layer_types (length tied to num_hidden_layers).

Before

[config_diff] CohereLabs/tiny-aya-earth vs tiny (22 differences)
  bos_token_id                                     2                                  → 5
  cache_implementation                             hybrid                             → <missing>
  eos_token_id                                     3                                  → 255001
  head_dim                                         128                                → 2
  hidden_size                                      2048                               → 8
  intermediate_size                                11008                              → 32
  layer_switch                                     4                                  → <missing>
  layer_types                                      ['sliding_attention', 'sliding_att → ['sliding_attention', 'sliding_att
  logit_scale                                      1.0                                → 0.0625
  num_attention_heads                              16                                 → 4
  num_hidden_layers                                36                                 → 2
  num_key_value_heads                              4                                  → 2
  order_of_interleaved_layers                      local_attn_first                   → <missing>
  position_embedding_type                          rope_gptj                          → <missing>
  rope_theta                                       50000                              → 10000.0
  rotary_pct                                       1.0                                → <missing>
  use_embedding_sharing                            True                               → <missing>
  use_gated_activation                             True                               → <missing>
  use_parallel_block                               True                               → <missing>
  use_parallel_embedding                           False                              → <missing>
  use_qk_norm                                      False                              → <missing>
  vocab_size                                       262144                             → 261008

After

[config_diff] CohereLabs/tiny-aya-earth vs tiny (7 differences)
  head_dim                                         128                                → 2
  hidden_size                                      2048                               → 8
  intermediate_size                                11008                              → 32
  layer_types                                      ['sliding_attention', 'sliding_att → ['sliding_attention', 'sliding_att
  num_attention_heads                              16                                 → 4
  num_hidden_layers                                36                                 → 2
  num_key_value_heads                              4                                  → 2

Note

Low Risk
Low risk since this only adjusts the tiny model generation script’s Cohere2Config fields; the main risk is producing a tiny checkpoint with different token IDs/vocab size than before, which could affect downstream test expectations.

Overview
Aligns the generated tiny Cohere2ForCausalLM config with CohereLabs/tiny-aya-earth by hardcoding vocab_size=262144 and copying over non-size defaults like logit_scale, RoPE settings, BOS/EOS token IDs, and several legacy/behavior flags (e.g., cache implementation and parallel/gated options).

This reduces config diffs between the reference model and the generated tiny model while keeping the small dimension/layer counts unchanged.

Reviewed by Cursor Bugbot for commit 9ba79b3. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7f9f91c0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/conftest.py Outdated
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d7f9f91. Configure here.

Comment thread tests/conftest.py Outdated
@qgallouedec qgallouedec merged commit fc904f6 into main May 15, 2026
13 checks passed
@qgallouedec qgallouedec deleted the align-cohere2 branch May 15, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants