VOOZH about

URL: https://thenewstack.io/a-developers-guide-to-nim-nvidias-ai-application-platform/

⇱ A Developers Guide to NIM, Nvidia’s AI Application Platform - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-08-13 08:02:58
A Developers Guide to NIM, Nvidia’s AI Application Platform
tutorial,
AI / Developer tools / Microservices

A Developers Guide to NIM, Nvidia’s AI Application Platform

By offering a flexible suite of microservices, Nvidia NIM provides a robust, scalable and secure platform for AI inference.
Aug 13th, 2024 8:02am by Janakiram MSV
👁 Featued image for: A Developers Guide to NIM, Nvidia’s AI Application Platform
Image via Unsplash+. 

In March 2024, Nvidia announced NIM (Nvidia Inference Microservices), a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data centers and workstations.

This series delves into NIM, exploring its key features, benefits and applications, as well as providing a comprehensive guide for developers looking to leverage this generative AI platform.

NIM is available as APIs, within the Nvidia AI Enterprise software suite and as standalone container images.

What is Nvidia NIM?

NIM stands for Nvidia Inference Microservices, which means this is an offering meant for performing inference on generative AI models. When it was announced, NIM was only available as a set of APIs for developers. NIM is also a part of Nvidia AI Enterprise, which is built on infrastructure software from VMware and Red Hat. Recently, Nvidia has started to publish and maintain container images that are deployable locally on developer workstations and servers with Nvidia GPUs.

So, NIM is available as APIs, within the Nvidia AI Enterprise software suite and as standalone container images.

Let’s take a look at each of these to understand them better.

Nvidia NIM API

The Nvidia NIM API is a set of industry-standard APIs that enables developers to deploy AI models with ease, using just a few lines of code. Available as serverless inference endpoints, the NIM API provides a secure, streamlined path for iterating and building generative AI solutions.

The NIM API is built on a robust foundation, including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM and PyTorch. This architecture facilitates seamless AI inferencing at scale, allowing developers to consume state-of-the-art foundation models and fine-tuned models without worrying about the infrastructure.

The NIM API is compatible with OpenAI, allowing developers to leverage the power of OpenAI’s models and tools within their applications. Developers can use standard HTTP REST clients or OpenAI client libraries to consume the NIM API.

👁 Image

The NIM API provides several API endpoints that enable developers to interact with AI models, including:

  • Completions endpoint: This allows developers to generate text completions based on a given prompt.
  • Embeddings endpoint: This enables developers to generate text embeddings for a given input text.
  • Retrieval endpoint: This allows developers to retrieve relevant documents based on a given query.
  • Ranking endpoint: This enables developers to rank a list of passages or documents based on their relevance to a given query or prompt.

The NIM API has tight integration with popular LLM orchestration tools such as LangChain and LlamaIndex. Developers can easily build basic chatbots, AI assistants, retrieval augmented generation (RAG) applications and advanced applications based on agents.

Developers can get started with the NIM API by visiting the Nvidia API Catalog, where they can find documentation, API reference information and release notes. To use the NIM API, developers need to obtain an API key, available by joining the Nvidia Developer Program. There is a playground to explore models, prompts, parameters and responses. When developers sign up with NIM, they each receive 5,000 credits, with each credit corresponding to one inference call.

👁 Image

Nvidia NIM is quickly becoming the choice of developers to access the latest generative AI models. Recently, when Google launched Gemma 2 2B LLM, it made it available on NIM along with Hugging Face and Kaggle. Going forward, you can expect other model providers to offer their models on the Nvidia NIM inference platform.

I will explore the NIM API in detail in an upcoming tutorial in this series.

Nvidia NIM Within Nvidia AI Enterprise

Nvidia AI Enterprise is a comprehensive, cloud native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade copilots and other generative AI applications. As part of this platform, Nvidia NIM is a set of easy-to-use inference microservices that enable developers to deploy foundation models on any cloud or data center while keeping their data secure.

The software layer of the Nvidia AI platform, Nvidia AI Enterprise, accelerates the data science pipeline and streamlines the development and deployment of production AI — including generative AI, computer vision, speech AI and more. With over 100 frameworks, pretrained models, development tools and microservices, Nvidia AI Enterprise is designed to accelerate enterprises to the leading edge of AI while also simplifying AI to make it accessible to every enterprise.

👁 Image

Nvidia NIM is a critical component of the Nvidia AI Enterprise platform, providing optimized model performance with enterprise-grade security, support and stability. With NIM, developers can deploy AI models with ease, using just a few lines of code. This enables them to focus on building enterprise applications while Nvidia handles the complexities of AI model deployment.

The Nvidia AI Enterprise platform can be deployed on systems like Nvidia DGX, certified hardware from Nvidia partners, and public cloud environments like AWS, Azure and GCP.

Nvidia NIM as Self-Hosted Container

For developers who do not have access to Nvidia AI Enterprise, NIM is available as a self-contained image that can be deployed using Docker or Kubernetes.

NIM abstracts model inference internals, including runtime operations and the execution engine. They are also the most efficient option available, regardless of whether they are used with TRT-LLM, vLLM or a similar inference engine.

👁 Image

NIMs are packaged as container images for each model or model family. Each NIM is its own Docker container with a specific model, such as meta/llama3-8b-instruct. These containers come with a runtime that works on any Nvidia GPU with enough memory, but some model/GPU combinations work better than others. Utilizing any local filesystem cache that is available, NIM automatically downloads the model from Nvidia’s NGC catalog. Because each NIM is built on the same base image, downloading additional NIMs is extremely fast.

To get started with Nvidia NIM, pull the NIM container from the Nvidia Docker Registry and run it using the docker run command on a GPU machine configured with Docker and the Nvidia Container Toolkit. To access the NIM API, generate an API key from the Nvidia GPU Cloud and use the docker login command to authenticate with the Nvidia Container Registry. Finally, launch the NIM container using the docker run command, specifying the container name, repository and tag.

👁 Image

Once the container is running, you can validate the deployment by executing an inference request using the curl command. Additionally, you can use the OpenAI Python API library to send requests to the NIM API. By following these steps, you can easily deploy and use Nvidia NIM on your system.

👁 Image

When NIM is first deployed, it inspects the local hardware configuration and the available optimized model in the model registry before automatically selecting the best version of the model for the available hardware. NIM downloads the optimized TensorRT (TRT) engine and runs inference using the TRT-LLM library on a subset of supported GPUs. For other GPUs, NIM downloads an unoptimized model and runs it with the vLLM library.

By offering a flexible suite of microservices through APIs, integration with Nvidia AI Enterprise and self-hosted container images, NIM provides developers with a robust, scalable and secure platform for AI inference.

What I like about NIM containers is that they are capable of running on consumer-grade GPUs such as GeForce RTX 4090, giving developers a chance to quickly prototype applications on accessible and affordable hardware. In the following parts of this series, I will explore how to deploy NIM locally and build applications that consume the API.

Summing Up

Nvidia NIM represents a significant advancement in the deployment and utilization of generative AI models. By offering a flexible suite of microservices through APIs, integration with Nvidia AI Enterprise and self-hosted container images, NIM provides developers with a robust, scalable and secure platform for AI inference. Whether leveraging cloud infrastructure or local GPU resources, NIM simplifies the complexities of AI model deployment, enabling rapid development and iteration of AI applications. As I continue this series, I will delve deeper into each aspect of Nvidia NIM, providing detailed guidance and tutorials to help developers maximize the potential of this powerful platform.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
AWS, Docker, Google, Microsoft, Red Hat and VMware are also sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Docker, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.