Hugging Face

Team

company

Verified

https://huggingface.co

huggingface

Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

jeffboudier updated a Space about 2 hours ago

huggingface/how-to-upgrade-to-enterprise

abidlabs updated a Space about 3 hours ago

huggingface/jobs-actions-dispatcher

hubnemo new activity about 4 hours ago

huggingface/documentation-images:Images for docs PR 3300

View all activity

Papers

Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation

Qualixar OS: A Universal Operating System for AI Agent Orchestration

View all Papers

Articles

Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

Jan 27

• 45

One Year Since the “DeepSeek Moment”

Jan 20

• 62

On the Shifting Global Compute Landscape

Oct 29, 2025

• 61

Announcing Hugging Face Fundamentals: A New Learning Track on DataCamp

Oct 16, 2025

• 24

Yay! Organizations can now publish blog Articles

Jan 20, 2025

• 53

View all articles

jeffboudier

updated a Space about 2 hours ago

Enterprise Hub

🔒

Why you need it, how to get it

abidlabs

updated a Space about 3 hours ago

jobs-actions Dispatcher

🏃

Run GitHub Actions on Hugging Face Jobs

hubnemo

in huggingface/documentation-images about 4 hours ago

Images for docs PR 3300

#625 opened about 4 hours ago by

hubnemo

sergiopaniego

posted an update about 7 hours ago

Post

Frontier agents are this good partly because the model was trained inside the very harness it ships with.

NVIDIA's new paper "Polar: Agentic RL on Any Harness at Scale" brings that recipe to the open: it turns coding harnesses like Codex, Claude Code, Qwen Code or Pi into RL training environments without touching their internals.

The core idea: every agent, however complex or closed, talks to a model through an API, so they put a proxy there. The harness runs exactly like in production while the proxy records prompts, sampled token ids and logprobs. Trajectories get rebuilt outside, token faithful, so gradients hit the exact tokens the policy sampled.

The gains are consistent across all four harnesses. Same Qwen3.5-4B, plain GRPO, evaluated on SWE-Bench Verified:

Codex 3.8 → 26.4 (+22.6)
Claude Code 29.8 → 34.6 (+4.8)
Qwen Code 34.6 → 35.2 (+0.6)
Pi 34.2 → 40.4 (+6.2)

The biggest gains appear on unfamiliar execution paths, Codex being the clearest case. The takeaway: you are not just training a model, you are training the model + harness system.

Two engineering pieces make it work at scale. Async worker pools isolate container boots (CPU), agent execution (GPU) and long tail test runs, so slow runtimes never block the GPUs. And prefix merging stitches hundreds of captured API calls back into contiguous traces: 5.4x faster trainer updates and rollout GPUs at 88% utilization.

It also doubles as an SFT data factory: 504 test verified agent traces from a 122B teacher, multi-turn conversations averaging 104 messages each, coming to the Hub under Apache 2.0 (release pending review).

Paper authors: Binfeng Xu, Hao Zhang, Shaokun Zhang, Songyang Han, Mingjie Liu, Jian Hu, Shizhe Diao, Zhenghui Jin, Yunheng Zou, Michael Demoret, Jan Kautz and Yi Dong.

> Paper: Polar: Agentic RL on Any Harness at Scale (2605.24220)
> Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
> Training data: NovaSky-AI/SkyRL-v0-293-data

lysandre

updated a dataset about 12 hours ago

huggingface/transformers-metadata

Viewer • Updated about 7 hours ago • 2.28k • 1.72k • 38

evalstate

updated a bucket about 13 hours ago

huggingface/skills

4.9 MB

sayakpaul

updated a dataset about 21 hours ago

huggingface/diffusers-metadata

Viewer • Updated about 16 hours ago • 97 • 2.24k • 29

alvarobartt

updated a dataset about 22 hours ago

huggingface/DEH-image-scan-data

Viewer • Updated about 20 hours ago • 4 • 9.73k • 14

ariG23498

updated a dataset 1 day ago

huggingface/documentation-images

Viewer • Updated 1 day ago • 59 • 2.12M • 153

nielsr

submitted a paper to Daily Papers 2 days ago

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Paper • 2606.03748 • Published 4 days ago • 5

sergiopaniego

posted an update 2 days ago

Post

124

The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from PyTorch Conf Europe is live!

@kashif and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, and agentic post-training at scale

https://www.youtube.com/watch?v=rPBeXFntJSU

sergiopaniego

posted an update 3 days ago

Post

how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @aminediroHF

what I like the most is the way it proves you can use the Hub for basically everything 🧐 → trainer on one machine, vLLM in a HF Space, the wordle env in another HF Space and weights going through a Hub Bucket. no shared cluster, just HTTPS

it works because ~99% of bf16 weights don't change between RL steps so you only sync the diff. 1.2 GB to 25 MB of payload per step

https://huggingface.co/blog/delta-weight-sync

sergiopaniego

posted an update 4 days ago

Post

2262

most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it 🤓

https://qgallouedec-tito.hf.space/

sergiopaniego

posted an update 4 days ago

Post

6204

new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

1 reply

sergiopaniego

posted an update 7 days ago

Post

156

If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments

sergiopaniego

posted an update 8 days ago

Post

223

periodic reminder 🧐

some HF blog posts use a special template, long-form, animated, super deep

I keep them all in one collection that gets updated every time a new one drops so you don't lose track

https://hf.co/collections/sergiopaniego/research-and-long-form-blog-posts

spot one missing? let me know

nielsr

submitted a paper to Daily Papers 10 days ago

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Paper • 2605.27295 • Published 11 days ago • 23

victor

posted an update 10 days ago

Post

813

Sharing how I built the LongCat-Video-Avatar 1.5 Space (+500k views on X) in one agent session. Gave a coding agent its own AI lab on ZeroGPU, framed the goal, walked away. It designed, deployed, tested against the live API, fixed, shipped.

Full recipe with the copy-paste prompt: https://huggingface.co/blog/victor/building-zerogpu-spaces-autonomously

1 reply

sergiopaniego

posted an update 11 days ago

Post

9946

Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://huggingface.co/blog/agent-glossary