VOOZH about

URL: https://thenewstack.io/how-to-run-deepseek-models-locally-on-a-windows-copilot-pc/

⇱ How to Run DeepSeek Models Locally on a Windows Copilot+ PC - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-03-07 09:00:53
How to Run DeepSeek Models Locally on a Windows Copilot+ PC
tutorial,
AI Operations / Large Language Models

How to Run DeepSeek Models Locally on a Windows Copilot+ PC

Running DeepSeek models on Microsoft's new Copilot+ PCs represents a significant advancement in local AI processing capabilities.
Mar 7th, 2025 9:00am by Janakiram MSV
👁 Featued image for: How to Run DeepSeek Models Locally on a Windows Copilot+ PC
Photo by Matheus Bertelli via Pexels.

With the Windows 11 version 24H2, Microsoft has enabled access to the Neural Processing Unit (NPU) on Copilot+ PCs. Although Copilot+ PCs support Intel Core Ultra 200V and AMD Ryzen AI 300 series processors, the initial Copilot+ PC models came with Qualcomm Snapdragon X Elite and Snapdragon X Plus processors. These ARM-based processors feature advanced Neural Processing Units (NPUs) capable of performing over 40 trillion operations per second (TOPS).

Recently, Microsoft enabled developers to download and execute foundation models locally that are optimized for Copilot+ PC. Microsoft has introduced the DeepSeek R1 models, specifically the 7B and 14B distilled versions, which are optimized for execution on devices equipped with the latest Neural Processing Units.

Running DeepSeek models on Copilot+ PCs represents a significant advancement in local AI processing capabilities.

DeepSeek Models on Copilot+ PCs

Model Availability: The DeepSeek R1 models, including the initial 1.5B version, are now accessible through the Azure AI Foundry. The larger 7B and 14B models have been designed to run efficiently on Copilot+ PCs powered by Qualcomm Snapdragon X processors, with support for Intel Core Ultra 200V and AMD Ryzen processors expected soon.

Efficiency and Performance: These distilled models are tailored for low-bit processing, allowing them to operate effectively on consumer-grade hardware without heavily relying on cloud resources. This results in faster execution times for tasks such as natural language processing, enabling real-time responses and improved user experiences.

NPU Optimization: The NPUs in Copilot+ PCs are engineered to efficiently handle AI workloads. This allows for sustained AI compute power while minimizing impacts on battery life and thermal performance, freeing up CPU and GPU resources for other tasks.

Developer Access: Developers can easily access these models by downloading the AI Toolkit VS Code extension. The models are available in the ONNX QDQ format, which is optimized for deployment on NPUs. This toolkit facilitates experimentation with AI functionalities directly on local machines, enhancing development workflows.

Use Cases: Integrating DeepSeek models enables developers to create smarter applications across various domains, such as virtual assistants, speech recognition systems and automation tools. Due to the local execution of these AI models, users can expect improved performance in tasks like document summarization and email management.

Running DeepSeek Models on Copilot+ PCs

On a Copilot+ PC powered by Qualcomm Snapdragon processors, you can run DeepSeek through the AI Toolkit extension for VS Code. Once you install AI Toolkit, you can access the Azure model catalog, which shows all the models available for inference.

👁 Image

You can filter on models specifically optimized for NPU.

👁 Image

I downloaded the DeepSeek R1 Distilled 1.5b model to run locally. After the file is downloaded, you can access it from the playground.

👁 Image

You can now change the parameters and generate responses from the model.

👁 Image

The downloaded files are stored in the .\aitk\models directory of your home folder.

👁 Image

Accessing DeepSeek Model Through API

The AI Toolkit extension has an OpenAI-compatible API endpoint exposed on your localhost on port 5272. You can use the standard OpenAI libraries to access the models. You just need to change the model name you downloaded from the model catalog.

Below is the code snippet to access the model programmatically:

from openai import OpenAI

client = OpenAI(
 base_url="http://127.0.0.1:5272/v1/",
 api_key="x" # required by API but not used
)

chat_completion = client.chat.completions.create(
 messages=[
 {
 "role": "user",
 "content": "How much is 2+2?",
 }
 ],
 model="qnn-deepseek-r1-distill-qwen-1.5b",
)

print(chat_completion.choices[0].message.content)

The availability of DeepSeek models to Copilot+ PCs marks a pivotal moment in making advanced AI capabilities more accessible and efficient for developers and users alike. Developers can use these models to explore and learn, and embed them directly in the applications they ship.

With the ability to run complex reasoning models locally, AIPCs and Copilot+ PCs are set to revolutionize how AI is utilized in everyday applications, combining the power of local computing and on-device models.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Unit, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.