VOOZH about

URL: https://thenewstack.io/deploy-nvidia-triton-inference-server-with-minio-as-model-store/

⇱ Deploy Nvidia Triton Inference Server with MinIO as Model Store - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-12-03 06:00:45
Deploy Nvidia Triton Inference Server with MinIO as Model Store
tutorial,
Edge Computing / Kubernetes

Deploy Nvidia Triton Inference Server with MinIO as Model Store

This tutorial shows how to set up the Nvidia Triton Inference Server that treats the MinIO tenant as a model store.
Dec 3rd, 2021 6:00am by Janakiram MSV
👁 Featued image for: Deploy Nvidia Triton Inference Server with MinIO as Model Store

This tutorial is the latest part of a series where we build an end-to-end stack to perform machine learning inference at the edge. In the previous part of this tutorial series, we installed the MinIO object storage service on SUSE Rancher’s RKE2 Kubernetes distribution. We will extend that use case further by deploying Nvidia Triton Inference Server that treats the MinIO tenant as a model store.

👁 AI Inference cluster illustration

By the end of this tutorial, we will have a fully configured model server and registry ready for inference.

Step 1 — Populate the MinIO Model Store with Sample Models

Before deploying the model server, we need to have the model store or repository populated with a few models.

Start by cloning the Triton Inference Server GitHub repository.

git clone https://github.com/triton-inference-server/server.git

We will now run a shell script to download the models to the local filesystem, after which we will upload them to a MinIO bucket.

Run the ./fetch_models.sh script available at server/docs/examples directory.

Wait for all the models to get downloaded in the model_repository directory. It may take a few minutes, depending on your Internet connection.

👁 Model repository

Let’s use the MinIO CLI to upload the models from the model_repository directory to the models bucket. The bucket was created within the model-registry tenant created in the last tutorial.

Run the command from the model_repository directory to copy the files to the bucket.

mc --insecure cp --recursive . model-registry/models

Check the uploads by visiting the MinIO Console. You should be able to see the directories copied to the models bucket.

👁 Minio console for building models.

We are now ready to point NVIDIA Triton Inference Server to MinIO.

Step 2 — Deploy Triton Inference Server on RKE2

Triton expects Amazon S3 as the model store. To access the bucket, it needs a secret with the AWS credentials.

In our case, these credentials are essentially the MinIO tenant credentials saved from the last tutorial.
Create a namespace and the secret within that.

kubectl create ns model-server

kubectl create secret generic aws-credentials --from-literal=AWS_ACCESS_KEY_ID=admin --from-literal=AWS_SECRET_ACCESS_KEY=7c5c084d-9e8e-477b-9a2c-52bbf22db9af -n model-server

Don’t forget to replace the credentials with your values.

Now, create the deployment, service and apply them.

kubectl apply -f triton-deploy.yaml
kubectl apply -f triton-service.yaml

👁 kubectl get pods command

To make the Triton pod access Minio service, we fixed the certificate issue with the below command:

cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates && update-ca-certificates

We passed the MinIO bucket to Triton using the standard Amazon S3 convention – s3://https://minio.model-registry.svc.cluster.local:443/models/

Finally, check the logs of the Triton pod and make sure everything is working properly.

kubectl logs triton-59994bb95c-7hgt7 -n model-server

👁 Kubectl command for fetching Triton logs.

If you see the above in the output, it means that Triton is able to download the models from the model store and serve them through the HTTP and gRPC endpoints.

Step 3 — Run Inference Client against Triton

Start by cloning the repo to get the code for inference.

cd https://github.com/triton-inference-server/client.git

cat <> requirements.txt
cat requirements.txt
pillow
numpy
attrdict
tritonclient
google-api-python-client
grpcio
geventhttpclient
boto3
EOF

pip3 install -r requirements.txt

Navigate to the client/src/python/examples directory and execute the following command


python3 image_client.py \
-u TRITON_HTTP_ENDPOINT \
-m inception_graphdef \
-s INCEPTION \
-x 1 \
-c 1 \
car.jpg

Replace TRITON_HTTP_ENDPOINT with the host and nodeport of the Triton service. Send an image of a car and you should see the below output:

👁 Inference output.

The client has invoked the Trinton inference endpoint with a request to load the inception model already available in the model store. Triton has performed the inference and printed the labels based on the classification.

Congratulations! You have successfully deployed and configured the model server backed by a model store running at the edge.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.