VOOZH about

URL: https://thenewstack.io/a-hugging-face-project-is-uncovering-deepseek-r1s-secrets/

⇱ A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-03-20 15:00:36
A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets
AI / AI Engineering / Emerging technologies

A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets

Open-R1 is a project to answer questions about DeepSeek-R1 and R1-Zero. Here's what the research team has learned and created to date.
Mar 20th, 2025 3:00pm by Loraine Lawson
👁 Featued image for: A Hugging Face Project Is Uncovering DeepSeek-R1’s Secrets
Photo by Solen Feyissa on Unsplash.

DeepSeek-R1’s release was a huge wake-up call for the AI world, according to Jeff Boudier, who leads product and growth at Hugging Face.

“The wake-up call was that, in order to get the best possible AI, you don’t need to rely on closed models from OpenAI, Anthropic, Google, etc.,” Boudier said. “You can access an open model here from DeepSeek with similar capabilities, coming from a research lab that was previously not very much known.”

Hugging Face is a company that serves as a repository hub and community for open source large language models (LLMs). It very quickly saw the impact of DeepSeek-R1, which is hosted on the platform.

“What was interesting is that it was not just a big announcement for sort of the general public, it also created a flurry of activity within the AI community, and we saw that directly on Hugging Face,” Boudier told The New Stack. “The R1 release today — that’s over 10 million downloads on Hugging Face and that’s just the last 30 days.”

How DeepSeek Changed AI

DeepSeek creates very efficient models that run on less powerful hardware. That’s unusual in AI, so much so that when its R1 model was released in January, it triggered a stock dive for NVIDIA, which manufactures the graphics processing units (GPUs) upon which other AI systems rely.

DeepSeek also used multiple neural networks instead of relying on a single “generalist” model. Plus, it was inexpensive to train at just $5.5 million compared to other generation AI models, “thanks to architectural changes like Multi-Token Prediction (MTP), Multi-Head Latent Attention (MLA) and a LOT (seriously, a lot) of hardware optimization,” Hugging Face researchers wrote in a blog post.

The DeepSeek organization on Hugging Face is also the most followed organization on the site, with more than 45,000 followers. That’s more than Google, Microsoft or other large AI players. There are now thousands of DeepSeek model derivatives available on the hub, he added.

It also changed the game for those organizations that want to use AI. Now, organizations can download the open source DeepSeek, released under the MIT license, and host it on premises.

“If you’re an enterprise, you don’t need to send your customer data to an API anymore, like that of OpenAI or others,” Boudier said. “You can actually host everything in-house. And it’s also MIT-licensed, so you can use it for whatever commercial purpose. That’s really, really powerful.”

The Open-R1 Project

DeepSeek didn’t just release its open source R1 and R1-Zero models — the Chinese company released a technical report that was “very generous in terms of the knowledge they shared and how they were able to create R1 and R1-Zero models using reinforcement learning techniques and some of these tricks,” Boudier explained.

The techniques described in the technical report were implemented within Hugging Face libraries, so they can be used by research labs around the world, he added. That included techniques such as Generative Reasoning and Planning Optimization (GRPO), which enables the AI to think through completing more complex tasks and then improve over time.

But there were some missing pieces in DeepSeek’s research, Boudier said.

“The technical report did not explain or describe the training data that was used to train and align the R1 model,” he said. “It did not describe the distillation process.”

Specifically, a Hugging Face research team noted, the report left questions about:

  • Data collection, such as how the reasoning-specific datasets were curated.
  • Model training. “No training code was released by DeepSeek, so it is unknown which hyperparameters work best and how they differ across different model families and scales,” the researchers said.
  • Scaling laws. “What are the compute and data trade-offs in training reasoning models?” Hugging Face researchers asked.

These questions lead to the creation of the Open-R1 project, an initiative that is systematically reconstructing DeepSeek-R1’s data and training pipeline, validating its claims, and “pushing the boundaries of open reasoning models,” the researchers wrote.

“By building Open-R1, we aim to provide transparency on how reinforcement learning can enhance reasoning, share reproducible insights with the open source community, and create a foundation for future models to leverage these techniques,” they stated.

The Hugging Face researchers outlined their “plan of attack” for Open-R1:

  1. Replicate the R1-Distill models by distilling a high-quality reasoning dataset from DeepSeek-R1.
  2. Replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale data sets for math, reasoning, and code.
  3. Show they can go from base model → SFT → RL via multi-stage training.

Reproducing the DeepSeek-R1 pipeline allows the research labs to go through the exact same process that DeepSeek went through when they created DeepSeek-R1 and DeepSeek-R1-Zero, which were reasoning models distilled from the foundation model, DeepSeek-V3.

Open-R1’s Purpose

Open-R1 isn’t designed to create new models per se — it’s more about creating and freely publishing artifacts.

One of the missing pieces in DeepSeek’s published research was how to go from a large, pre-trained model that has general knowledge and has been trained on trillions and trillions of tokens to a model that’s very good at a particular domain.

The key was creating reasoning traces that are produced by inferencing this “very capable model” on a specific domain and questions, Boudier said. Reasoning traces refer to a record or log of the steps an AI system takes to arrive at a conclusion or decision. Think of it as recording the AI’s “thought process.”

“You can actually host everything in-house. And it’s also MIT-licensed, so you can use it for whatever commercial purpose. That’s really, really powerful.”
— Jeff Boudier, head of product and growth at Hugging Face

In the case of DeepSeek-R1 and R1-Zero, the reasoning is on a specific domain, rather than, say, the whole internet.

“You can take a model and then teach it through distillation to be really, really good at this particular type of tasks” through reasoning traces, Boudier explained.

That’s what the Hugging Face team released in its second update — a mathematical reasoning traces dataset called Open-R1-Math-220k that has more than 200,000 reasoning traces for complex mathematical questions.

“The synthetic datasets will allow everybody to fine-tune existing or new LLMs into reasoning models by simply fine-tuning on them,” the team said of the math datasets. “The training recipes involving RL [reinforcement learning] will serve as a starting point for anybody to build similar models from scratch and will allow researchers to build even more advanced methods on top.”

There’s a lot of potential in exploring other areas, including code but also scientific fields such as medicine, “where reasoning models could have a significant impact,” they stated.

The Latest Release

The Open-R1 project just released its third update, which Boudier called the “most exciting update to date.”

It includes a code programming data set with more than 100,000 events programming reasoning traces obtained from DeepSeek R1. This dataset can be used to train new models to better understand the nuances of code, enabling the AI model to explain the reasoning behind the code. From it, the team built the OlympicCoder 7-billion and 32-billion parameter models.

“What’s really exciting is that by applying the distillation pipeline that they recreated from the R1 paper and from the R1 release, they were able to create these really, really powerful models,” Boudier said. “To give you a sense, the 32-billion model outperforms Claude Sonnet, which is the Anthropic state-of-the-art model for advanced programming challenges.”

The team also released a new IOI benchmark — based on the annual competitive programming competition, the International Olympiads of Informatics — to have a new way to measure a model’s ability to tackle more challenging programming problems.

TRENDING STORIES
Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site Bank Automation News. She has...
Read more from Loraine Lawson
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.