VOOZH about

URL: https://thenewstack.io/serve-tensorflow-models-with-kserve-on-google-kubernetes-engine/

⇱ Serve TensorFlow Models with KServe on Google Kubernetes Engine - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-03-25 08:29:45
Serve TensorFlow Models with KServe on Google Kubernetes Engine
tutorial,
Data / Kubernetes

Serve TensorFlow Models with KServe on Google Kubernetes Engine

This tutorial will walk you through all the steps required to install and configure KServe on a Google Kubernetes Engine cluster powered by Nvidia T4 GPUs.
Mar 25th, 2022 8:29am by Janakiram MSV
👁 Featued image for: Serve TensorFlow Models with KServe on Google Kubernetes Engine
Feature Image by Rudy and Peter Skitterians from Pixabay.

I introduced KServe as a scalable, cloud native, open source model server in the previous article. This tutorial will walk you through all the steps required to install and configure KServe on a Google Kubernetes Engine cluster powered by Nvidia T4 GPUs. We will then deploy a TensorFlow model to perform inference.

Step 1 – Launch a GKE Cluster with T4 GPU Node

Assuming you have access to Google Cloud Platform, run the following command to launch a 3-node cluster configured to use one Nvidia T4 GPU. Replace the project, zone, and other values appropriately to reflect your environment.

gcloud beta container clusters create "tns-kserve" \
--project "janakiramm-sandbox" \
--zone "asia-southeast1-c" \
--no-enable-basic-auth \
--cluster-version "1.22.4-gke.1501" \
--machine-type "n1-standard-4" \
--accelerator "type=nvidia-tesla-t4,count=1" \
--num-nodes "3" \
--image-type "UBUNTU_CONTAINERD" \
--disk-type "pd-standard" \
--disk-size "100" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append"

👁 Image

Add a cluster-admin role for the GCP user.

kubectl create clusterrolebinding cluster-admin-binding \
 --clusterrole=cluster-admin \
 --user=$(gcloud config get-value core/account)

Install the device plugin for Nvidia T4 GPU and validate that it is accessible.

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
kubectl get pods -n kube-system -l k8s-app=nvidia-gpu-device-plugin

Create a pod to test the access based on the Nvidia CUDA image.

apiVersion: v1
kind: Pod
metadata:
 name: my-gpu-pod
spec:
 containers:
 - name: my-gpu-container
 image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
 command: ["/bin/bash", "-c", "--"]
 args: ["while true; do sleep 600; done;"]
 resources:
 limits:
 nvidia.com/gpu: 1
kubectl apply -f gpu-pod.yaml

Run the command nvidia-smi to test GPU access

kubectl exec -it my-gpu-pod -- nvidia-smi

👁 Image

With the infrastructure in place, let’s proceed with KServe installation.

Step 2 – Installing Istio

Istio is an essential prerequisite for KServe. Knative Serving relies on Istio ingress to expose KServe API endpoints. For version compatibility, check the documentation.

Download the Istio binary and your local workstation, and run the CLI for installation.

​​curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y

Verify that all pods are in running state in the istio-system namespace.

👁 Image

Step 3 – Installing Knative Serving

Install Knative CRDs and core services.

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-core.yaml

To integrate Knative with Istio Ingress, run the below commands.

kubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/net-istio.yaml

Finally, configure the DNS for Knative that points to the sslip.io domain.

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-default-domain.yaml

Make sure that Knative Serving is successfully running.

👁 Image

Step 4 – Installing Certificate Manager

Install cert manager with the following command:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml

👁 Image

Step 5 – Install KServe Model Server

We are now ready to install the KServe model server on the GKE Cluster.

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.7.0/kserve.yaml
kubectl get pods -n kserve

👁 Image

KServe also installs a couple of custom resources. Check them out with the below command:

kubectl get crd | grep "kserve"

👁 Image

Step 5 – Configuring Google Cloud Storage Bucket and Uploading a TensorFlow Model

KServe can pull models from a Google Cloud Storage (GCS) Bucket to serve them for inference. Let’s create the bucket and upload the model.

We will use the model from one of my previous tutorials that trained a CNN model to classify dogs and cats for this scenario. You can download the pre-trained TensorFlow model from here. Unzip the file and run the below commands to create the GCS bucket and upload the model artifacts.

gsutil mb gs://tns-kserve
gsutil iam ch allUsers:objectViewer gs://tns-kserve
gsutil cp -R model/ gs://tns-kserve

👁 Image

For simplicity, we enabled public access to the bucket. But you may want to secure it and add the service account key as a secret for KServe to access the private bucket.

Step 6 – Creating and Deploying the TensorFlow Inference Service

Let’s go ahead and create an inference service pointing to the model uploaded to the GCS bucket. Notice that we use a node selector to ensure that the service utilizes the GPU for acceleration.

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
 name: "dogs-vs-cats"
spec:
 predictor:
 tensorflow:
 storageUri: "gs://tns-kserve/model"
 resources:
 limits:
 nvidia.com/gpu: 1
 requests:
 nvidia.com/gpu: 1 

Wait for KServe to generate the endpoint for the inference service.

kubectl get inferenceservice

👁 Image

Step 7 – Performing Inference with KServe and TensorFlow

Install the below Python modules in a virtual environment:

pip install pillow \
	h5py \
	tensorflow \
	requests \
	numpy

Execute the client code with sample images of dogs and cats to see the inference in action.

import argparse
import json

import numpy as np
import requests
import tensorflow
import PIL
from tensorflow.keras.preprocessing import image

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
 help="path of the image")
ap.add_argument("-u", "--uri", required=True,
 help="URI of model server")

args = vars(ap.parse_args())

image_path = args['image']
uri = args['uri']

img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255.

payload = {
 "instances": [{'conv2d_input': img.tolist()}]
}

r = requests.post(uri+'/v1/models/dogs-vs-cats:predict', json=payload)
pred = json.loads(r.content.decode('utf-8'))
predict=np.asarray(pred['predictions']).argmax(axis=1)[0]
print( "Dog" if predict==1 else "Cat" )

👁 Image

👁 Image

python infer.py \
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io \
-i sample1.jpg

👁 Image

👁 Image

python infer.py \
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io \
-i sample2.jpg

This concludes the end-to-end tutorial on KServe which covered everything you need to explore the popular model server.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.