Align tiny Cohere2 config with tiny-aya-earth#5707
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7f9f91c0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d7f9f91. Configure here.
Co-authored-by: Quentin Gallouédec <[email protected]>

Reduce the config diff between
tiny-Cohere2ForCausalLMand the referenceCohereLabs/tiny-aya-earthby mirroring non-size config fields:vocab_size=262144(waslen(tokenizer.vocab)=261008)logit_scale=1.0,rope_theta=50000,bos_token_id=2,eos_token_id=3cache_implementation,layer_switch,order_of_interleaved_layers,position_embedding_type,rotary_pct,use_embedding_sharing,use_gated_activation,use_parallel_block,use_parallel_embedding,use_qk_norm)Remaining diffs are intentional size reductions (
head_dim,hidden_size,intermediate_size,num_attention_heads,num_hidden_layers,num_key_value_heads) pluslayer_types(length tied tonum_hidden_layers).Before
After
Note
Low Risk
Low risk since this only adjusts the tiny model generation script’s
Cohere2Configfields; the main risk is producing a tiny checkpoint with different token IDs/vocab size than before, which could affect downstream test expectations.Overview
Aligns the generated
tinyCohere2ForCausalLMconfig withCohereLabs/tiny-aya-earthby hardcodingvocab_size=262144and copying over non-size defaults likelogit_scale, RoPE settings, BOS/EOS token IDs, and several legacy/behavior flags (e.g., cache implementation and parallel/gated options).This reduces config diffs between the reference model and the generated tiny model while keeping the small dimension/layer counts unchanged.
Reviewed by Cursor Bugbot for commit 9ba79b3. Bugbot is set up for automated code reviews on this repo. Configure here.