VOOZH about

URL: https://thenewstack.io/deep-dive-into-deepseek-r1-how-it-works-and-what-it-can-do/

⇱ Deep Dive Into DeepSeek-R1: How It Works and What It Can Do - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-02-17 11:30:13
Deep Dive Into DeepSeek-R1: How It Works and What It Can Do
research,
AI / AI Engineering / Large Language Models

Deep Dive Into DeepSeek-R1: How It Works and What It Can Do

How does OpenAI competitor DeepSeek-R1 work, what is it capable of and what are some potential flaws? We look at what's under the hood.
Feb 17th, 2025 11:30am by Kimberley Mok
👁 Featued image for: Deep Dive Into DeepSeek-R1: How It Works and What It Can Do
Image via Pexels.

The dust is still settling after the recent release of DeepSeek R-1, a Chinese large language model that purportedly is on par with OpenAI’s o1 LLM for reasoning tasks, but was trained for about $6 million — a fraction of the approximately $100 million cost to train OpenA1’s o1.

With the R1 model’s weights and inference code being openly released on Hugging Face and GitHub, respectively, it’s also worth noting that the training code and the training data itself haven’t been published. But while DeepSeek seems to be shaping up as an open source success story, the resulting fallout in both the stock market and broader AI industry hints at a potential paradigm shift in the LLM landscape.

So, how does DeepSeek-R1 work, what is it capable of, and what are some potential flaws? Let’s examine its model architecture, capabilities and drawbacks.

Model Architecture of DeepSeek-R1

Here’s what we know of the architecture:

  • Mixture of experts: DeepSeek-R1 uses a mixture-of-experts (MoE) model architecture, which divides the model into several “expert” sub-networks that each excel at processing subsets of input data. This means that only the relevant parts of the model are activated when performing tasks, resulting in lower computational resource consumption.
  • Gating and loss-free load balancing: This selective activation of DeepSeek’s 671 billion parameters is achieved through a gating mechanism that dynamically directs inputs to the appropriate experts, thus increasing computational efficiency without hindering performance or scalability. With each token, only 37 billion parameters are activated during a single forward pass, with techniques like loss-free load balancing, which helps to ensure that the usage of all expert sub-networks is distributed evenly to prevent bottlenecks.
  • Context length: DeepSeek-R1 is built off the base model architecture of DeepSeek-V3. Both feature a 128K context length, which is extended via a technique called YaRN (Yet another RoPE extensioN), which extends the context window of LLMs. YaRN is an improved version of Rotary Positional Embeddings (RoPE), a type of position embedding that encodes absolute positional information using a rotation matrix, with YaRN efficiently interpolating how these rotational frequencies in the matrix will scale. It’s a practical way to boost model context length and enhance generalization for longer contexts without the need for costly retraining.
  • Layers: DeepSeek-R1 features an embedding layer, as well as 61 transformer layers. Instead of the typical multi-head attention (MHA) mechanisms on the transformer layers, the first three layers consist of innovative Multi-Head Latent Attention (MLA) layers, and a standard Feed Forward Network (FFN) layer.
  • Multi-head attention: According to the team, MLA is equipped with low-rank key-value joint compression, which requires a much smaller amount of key-value (KV) cache during inference, thus reducing memory overhead to between 5 to 13 percent compared to conventional methods and offers better performance than MHA. A mixture-0f-experts layer replaces the Feed Forward Network (FFN) layer from layers 4 to 61 in order to permit ease of scalability, efficient learning and to reduce computational cost.
  • Multi-token prediction: This is an advanced approach to language modeling that predicts parallel multiple future tokens in a sequence rather than one subsequent word at a time. Initially introduced by Meta, multi-token prediction (MTP) enables the model to utilize multiple prediction pathways (also called “heads”), thus allowing for better anticipation of token representations and boosting the model’s efficiency and performance on benchmark tests.

DeepSeek-R1’s Capabilities

DeepSeek-R1 demonstrates state-of-the-art performance on a variety of reasoning benchmarks, particularly in questions related to math and related disciplines. On some math-related metrics, it was shown to outperform OpenAI’s o1. It is proficient at complex reasoning, question answering and instruction tasks. In particular, the combination of the features below makes R1 distinctive from its competitors.

👁 Image

Via adasci.org

  • Reinforcement learning with group relative policy optimization: DeepSeek-R1 was built on top of a preceding model, DeepSeek-V3-Base, using multiple stages of training with supervised fine-tuning and reinforcement learning with group relative policy optimization. GRPO is specifically designed to enhance reasoning abilities and reduce computational overhead by eliminating the need for an external “critic” model; instead, it evaluates groups of responses relative to one another. This feature means that the model can incrementally improve its reasoning capabilities toward better-rewarded outputs over time, without the need for large amounts of labeled data.
  • Reward modeling: This trial-and-error approach to learning incentivizes the model toward answers that are both correct and well-reasoned. It does this by assigning feedback in the form of a “reward signal” when a task is completed, thus helping to inform how the reinforcement learning process can be further optimized.
  • Cold-start data: DeepSeek-R1 uses “cold-start” data for training, which refers to a minimally labeled, high-quality, supervised dataset that “kickstart” the model’s training so that it quickly attains a general understanding of tasks.
  • Chain of thought: DeepSeek-R1 uses chain of thought (CoT) prompting to tackle reasoning tasks and perform self-evaluation. This simulates human-like reasoning by instructing the model to break down complex problems in a structured way, thus permitting it to logically deduce a coherent answer, and ultimately improving the readability of its answers.
  • Rejection sampling: The model also uses rejection sampling for weeding out lower-quality data, which means that after generating different outputs, the model only selects those that meet specific criteria for further epochs of fine-tuning and training.
  • Distillation: Using a curated dataset, DeepSeek-R1 has been distilled into smaller open versions that are relatively high-performing yet cheaper to run, most notably using Qwen and Llama architectures.
👁 Image

Via “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” research paper.

Potential Pitfalls

With any model, there are flaws that need to be balanced with the larger picture of performance and cost. According to AI security researchers at AppSOC and Cisco, here are some of the potential drawbacks to DeepSeek-R1, which suggest that robust third-party security and safety “guardrails” may be a wise addition when deploying this model.

  • Security: DeepSeek-R1 could be vulnerable to prompt injection attacks, resulting in erroneous outputs and potentially compromised systems. When tested, DeepSeek-R1 showed that it may be capable of generating malware in the form of malicious scripts and code snippets.
  • Safety: When tested with jailbreaking techniques, DeepSeek-R1 consistently was able to bypass safety mechanisms and generate harmful or restricted content, as well as responses with toxic or harmful wordings, indicating that the model is vulnerable to algorithmic jailbreaking and potential misuse.
  • Hallucinations: DeepSeek-R1 may be susceptible to generating false or fabricated answers.

Conclusion

Despite these shortcomings, DeepSeek-R1 demonstrates the potential power of the reward system underlying reinforcement learning when applied to LLMs.

During DeepSeek-R1’s training process, it became clear that by rewarding accurate and coherent answers, nascent model behaviors like self-reflection, self-verification, long-chain reasoning and autonomous problem-solving point to the possibility of emergent reasoning that is learned over time, rather than overtly taught — thus possibly paving the way for further breakthroughs in AI research.

TRENDING STORIES
Kimberley Mok is a tech and design reporter who covers artificial intelligence, robotics, quantum computing, tech culture and science stories for The New Stack. Trained as an architect, she is also an illustrator and multidisciplinary designer who has been passionate...
Read more from Kimberley Mok
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.