Train a model using Vertex AI and the Python SDK

Last Updated : 23 Jul, 2025

Vertex AI is an end-to-end, fully managed platform for machine learning and data science. It enables you to use the infrastructure and services of Google Cloud to create, train, implement, and administer machine learning models. A high-level library that assists you in automating data intake, model training, and prediction on Vertex AI is the Vertex AI SDK for Python. Most of the tasks that may be performed programmatically on the Google Cloud terminal can be done using Python code to access the Vertex AI API. In this article, we will learn how to use the Vertex AI SDK for Python to train a model on Vertex AI.

We will cover the following topics:

What are the main components and concepts of Vertex AI
How to install and import the Vertex AI SDK for Python
How to create a dataset and upload data to Vertex AI
How to define a custom training job and run it on Vertex AI
How to deploy the trained model and get predictions on Vertex AI

What are the main components of Vertex AI?

Before utilizing Vertex AI, you must be aware of its many parts and principles. Here are a few of the important ones:

Project: A project on Google Cloud is a container for all of your settings and resources. Before utilizing Vertex AI, you must establish a project and enable the Vertex AI API.
Dataset: A dataset is an assemblage of data intended for use in training or forecasting. On Vertex AI, you can build a variety of dataset types, including tabular, picture, text, video, and custom datasets. Additionally, you may import data from a variety of sources, including local files, Big Query, and Google Cloud Storage.
Training job: A training job is a procedure that uses your dataset to train a machine-learning model. On Vertex AI, you may design several training tasks, including custom, hyperparameter tweaking, and AutoML. For your training task, you may also provide several other factors and variables, such as machine type, region, scale tier, budget, etc.
Model: The result of a training task is a model. It stands for the rules and patterns that your data has taught you. A model can be used to fresh data to make predictions or assess it.
Endpoint: A service that houses one or more prediction models is called an endpoint. To obtain predictions from an endpoint, you may deploy your models there and submit queries to that endpoint. Vertex AI also allows you to manage and keep an eye on your endpoints.
Prediction: When a model is applied to an instance of input data, a prediction is produced. Your models can provide you with forecasts online or off. Sending queries to an endpoint and receiving real-time results is known as online prediction. Batch processing massive volumes of data and storing the findings in files is known as offline prediction.

Step 1: Install the Vertex AI SDK for Python

Installing the Google Cloud-platform package is required to utilize the Vertex AI SDK for Python. This package contains the Vertex AI Python client library in addition to the Vertex AI SDK for Python. A lower-level library that offers more precise control over the Vertex AI API calls is the client library. If necessary, you can utilize both libraries at once.

In your virtual environment, execute the following command to install the google-cloud-platform package:

Optional only if you using vertex ai workbench notebook:

Install the latest version of the Vertex AI client library.

Run the following command in your virtual environment to install the Vertex SDK for Python:

Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

Step 2: Setting Up the Environment

Before we dive into the code, we need to set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. Make sure you have the Google Cloud SDK (gcloud) installed and configured with your GCP project.

If you don't know your project ID, you may be able to get your project ID using gcloud.

When you run the above command it will show your project ID it will look like this :

Project ID: qwiklabs-gcp-04-c846c60XXXX

Copy the project ID and set your project ID here as the environment variable.

Create a timestamp for uniqueness:

Create a Cloud Storage bucket:

Replace REGION and BUCKET_NAME as per your project requirement.

Output :

Creating gs://qwiklabs-gcp-06-c846b60794346aip-20210826051667/...

Finally, validate access to your Cloud Storage bucket by examining its contents:

Step 3: Copying Dataset into Cloud Storage

In this step, we'll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.

Step 4: Importing the Vertex SDK for Python

We need to import the Vertex SDK and initialize it using our project ID and location:

Step 5: Creating a Managed Tabular Dataset

To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:

Output:


INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/1075205415941/locations/us-central1/datasets/1945247175768276992/operations/1110822578768838656
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/1075205415941/locations/us-central1/datasets/1945247175768276992
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/1075205415941/locations/us-central1/datasets/1945247175768276992')
'projects/1075205415941/locations/us-central1/datasets/1945247175768276992'

This will create a dataset from a CSV file stored on your GCS bucket.

Step 6: Launching a Training Job

Now, we are ready to create and train our AutoML tabular model:

Output:

opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
 and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
 app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://accounts.google.com/v3/signin/identifier?continue=https%3A%2F%2Fconsole.cloud.google.com%2Fai%2Fplatform%2Flocations%2Fus-central1%2Ftraining%2F1715908841423503360%3Fproject%3D1075205415941&followup=https%3A%2F%2Fconsole.cloud.google.com%2Fai%2Fplatform%2Flocations%2Fus-central1%2Ftraining%2F1715908841423503360%3Fproject%3D1075205415941&ifkv=AdBytiPZlZjePoeTjkngN-yCQlUQlHkp6vD2B21-Z32Sg-mcCtf4ZeS_L7jH6lfUI_QiPSyrMh2ZNg&osid=1&passive=1209600&service=cloudconsole&flowName=WebLiteSignIn&flowEntry=ServiceLogin&dsh=S-855040185%3A1753269019322728
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING

It takes more than 2 hours to complete the training.

Step 7: Deploying the Model

Before making predictions, we need to deploy the model to an endpoint:

Output:

/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
 and should_run_async(code)
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/7965582686603444224
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1075205415941/locations/us-central1/endpoints/7467372802459303936')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/2903536705439006720
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936

Step 8: Making Predictions

With the model deployed, you can now make predictions. Here's an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes

Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.

Output:

Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)

Step 9: (Optional) Undeploy the model

Step 10: (Optional) Cleaning up

Conclusion

In this article, we learned how to use the Vertex AI SDK for Python to train a model on Vertex AI. We covered the following steps:

How to create a dataset and upload data to Vertex AI
How to define a custom training job and run it on Vertex AI
How to deploy the trained model and get predictions on Vertex AI

We also learned about some of the main components and concepts of Vertex AI, such as project, dataset, training job, model, endpoint, and prediction. We used an image classification example to demonstrate how to use the Vertex AI SDK for Python, but you can apply the same steps to other types of datasets and models as well.

Comment

Article Tags:

Explore

Introduction to Machine Learning

Python for Machine Learning

Introduction to Statistics

Feature Engineering

Model Evaluation and Tuning

Data Science Practice

Courses

URL: https://www.geeksforgeeks.org/data-science/train-a-model-using-vertex-ai-and-the-python-sdk/