Hugging Face

Team

company

Verified

https://huggingface.co

huggingface

Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

jeffboudier updated a Space about 2 hours ago

huggingface/how-to-upgrade-to-enterprise

abidlabs updated a Space about 3 hours ago

huggingface/jobs-actions-dispatcher

hubnemo new activity about 4 hours ago

huggingface/documentation-images:Images for docs PR 3300

View all activity

Papers

Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation

Qualixar OS: A Universal Operating System for AI Agent Orchestration

View all Papers

Articles

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

Jan 27

• 45

One Year Since the “DeepSeek Moment”

Jan 20

• 62

On the Shifting Global Compute Landscape

Oct 29, 2025

• 61

Announcing Hugging Face Fundamentals: A New Learning Track on DataCamp

Oct 16, 2025

• 24

Yay! Organizations can now publish blog Articles

Jan 20, 2025

• 53

View all articles

jeffboudier

updated a Space about 2 hours ago

Enterprise Hub

🔒

Why you need it, how to get it

abidlabs

updated a Space about 3 hours ago

jobs-actions Dispatcher

🏃

Run GitHub Actions on Hugging Face Jobs

hubnemo

in huggingface/documentation-images about 4 hours ago

Images for docs PR 3300

#625 opened about 4 hours ago by

hubnemo

sergiopaniego

posted an update about 7 hours ago

Post

Frontier agents are this good partly because the model was trained inside the very harness it ships with.

NVIDIA's new paper "Polar: Agentic RL on Any Harness at Scale" brings that recipe to the open: it turns coding harnesses like Codex, Claude Code, Qwen Code or Pi into RL training environments without touching their internals.

The core idea: every agent, however complex or closed, talks to a model through an API, so they put a proxy there. The harness runs exactly like in production while the proxy records prompts, sampled token ids and logprobs. Trajectories get rebuilt outside, token faithful, so gradients hit the exact tokens the policy sampled.

The gains are consistent across all four harnesses. Same Qwen3.5-4B, plain GRPO, evaluated on SWE-Bench Verified:

Codex 3.8 → 26.4 (+22.6)
Claude Code 29.8 → 34.6 (+4.8)
Qwen Code 34.6 → 35.2 (+0.6)
Pi 34.2 → 40.4 (+6.2)

The biggest gains appear on unfamiliar execution paths, Codex being the clearest case. The takeaway: you are not just training a model, you are training the model + harness system.

Two engineering pieces make it work at scale. Async worker pools isolate container boots (CPU), agent execution (GPU) and long tail test runs, so slow runtimes never block the GPUs. And prefix merging stitches hundreds of captured API calls back into contiguous traces: 5.4x faster trainer updates and rollout GPUs at 88% utilization.

It also doubles as an SFT data factory: 504 test verified agent traces from a 122B teacher, multi-turn conversations averaging 104 messages each, coming to the Hub under Apache 2.0 (release pending review).

Paper authors: Binfeng Xu, Hao Zhang, Shaokun Zhang, Songyang Han, Mingjie Liu, Jian Hu, Shizhe Diao, Zhenghui Jin, Yunheng Zou, Michael Demoret, Jan Kautz and Yi Dong.

> Paper: Polar: Agentic RL on Any Harness at Scale (2605.24220)
> Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
> Training data: NovaSky-AI/SkyRL-v0-293-data

lysandre

updated a dataset about 12 hours ago

huggingface/transformers-metadata

Viewer • Updated about 7 hours ago • 2.28k • 1.72k • 38

evalstate

updated a bucket about 13 hours ago

huggingface/skills

4.9 MB

sayakpaul

updated a dataset about 21 hours ago

huggingface/diffusers-metadata

Viewer • Updated about 16 hours ago • 97 • 2.24k • 29

alvarobartt

updated a dataset about 22 hours ago

huggingface/DEH-image-scan-data

Viewer • Updated about 20 hours ago • 4 • 9.73k • 14

ariG23498

updated a dataset 1 day ago

huggingface/documentation-images

Viewer • Updated 1 day ago • 59 • 2.12M • 153

sergiopaniego

posted an update 2 days ago

Post

124

The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from PyTorch Conf Europe is live!

@kashif and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, and agentic post-training at scale

https://www.youtube.com/watch?v=rPBeXFntJSU

sergiopaniego

posted an update 3 days ago

Post

how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @aminediroHF

what I like the most is the way it proves you can use the Hub for basically everything 🧐 → trainer on one machine, vLLM in a HF Space, the wordle env in another HF Space and weights going through a Hub Bucket. no shared cluster, just HTTPS

it works because ~99% of bf16 weights don't change between RL steps so you only sync the diff. 1.2 GB to 25 MB of payload per step

https://huggingface.co/blog/delta-weight-sync

evalstate

posted an update 3 days ago

Post

3167

Hugging Face MCP Server v0.3.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SEP-2640 "Skills Over MCP" support added (early access)

2 replies

sergiopaniego

posted an update 4 days ago

Post

2262

most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it 🤓

https://qgallouedec-tito.hf.space/

sergiopaniego

posted an update 4 days ago

Post

6204

new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

1 reply

sergiopaniego

posted an update 7 days ago

Post

156

If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments

sergiopaniego

posted an update 8 days ago

Post

223

periodic reminder 🧐

some HF blog posts use a special template, long-form, animated, super deep

I keep them all in one collection that gets updated every time a new one drops so you don't lose track

https://hf.co/collections/sergiopaniego/research-and-long-form-blog-posts

spot one missing? let me know

victor

posted an update 10 days ago

Post

813

Sharing how I built the LongCat-Video-Avatar 1.5 Space (+500k views on X) in one agent session. Gave a coding agent its own AI lab on ZeroGPU, framed the goal, walked away. It designed, deployed, tested against the live API, fixed, shipped.

Full recipe with the copy-paste prompt: https://huggingface.co/blog/victor/building-zerogpu-spaces-autonomously

1 reply

sergiopaniego

posted an update 11 days ago

Post

9946

Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://huggingface.co/blog/agent-glossary

2 replies

alvarobartt

posted an update 15 days ago

Post

316

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

tomaarsen

posted an update 17 days ago

Post

706

🤗 Announcing the Ettin Reranker family: six new state-of-the-art CrossEncoder rerankers for search from 17M to 1B parameters, plus the full training data and the ~150-line recipe. Built on the Ettin ModernBERT encoders, Apache 2.0. Details:

All six were trained with the same single-stage pointwise MSE distillation recipe, with mixedbread-ai/mxbai-rerank-large-v2 (1.54B) as the teacher. Only the learning rate and per-device batch size change between sizes. The 1B student matches the teacher within 0.0001 NDCG@10 on MTEB(eng, v2) Retrieval, the 150M is the strongest reranker I tested in the under-600M range, and the 17M beats the 33M ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 at roughly half the parameter count.

Speed matters as much as quality for a reranker, since it determines whether the model fits the latency budget between retrieval and showing results. Our 17M is the fastest reranker in the whole comparison at 7517 pairs/sec on an H100. Our 150M runs 2.3x faster than the two other 150M ModernBERT-base rerankers (gte-reranker-modernbert-base and granite-embedding-reranker-english-r2) because the modular Transformer module propagates unpadded inputs through every layer rather than just the FA2 attention kernel. And our 1B is 2.4x faster than its 1.5B teacher while matching it on quality.

I bootstrapped the training recipe with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it with hf skills add train-sentence-transformers --claude and ask Claude Code (or Codex / Cursor / Gemini CLI) to fine-tune a SentenceTransformer, CrossEncoder, or SparseEncoder model on your data.

I wrote a blog post walking through usage, results across six embedder pairings, the speed story, and the complete training script. Check it out, or just point your Agent to the URL:

https://huggingface.co/blog/ettin-reranker

Collection: https://huggingface.co/collections/cross-encoder/ettin-rerankers

AI & ML interests

Recent Activity

Papers

Articles

Agentic RL: Token-In, Token-Out Done Right

Software Forgets: Agent Traces Are the Memory

mlinter: a linter for Transformers modeling files

From doctest to runnable Markdown

State of Open Source on Hugging Face: Spring 2026

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

One Year Since the “DeepSeek Moment”

On the Shifting Global Compute Landscape

Announcing Hugging Face Fundamentals: A New Learning Track on DataCamp

Yay! Organizations can now publish blog Articles

Team members 185

huggingface's activity

Enterprise Hub

jobs-actions Dispatcher

Images for docs PR 3300

huggingface/skills