![]() |
VOOZH | about |
Vertex AI is an end-to-end, fully managed platform for machine learning and data science. It enables you to use the infrastructure and services of Google Cloud to create, train, implement, and administer machine learning models. A high-level library that assists you in automating data intake, model training, and prediction on Vertex AI is the Vertex AI SDK for Python. Most of the tasks that may be performed programmatically on the Google Cloud terminal can be done using Python code to access the Vertex AI API. In this article, we will learn how to use the Vertex AI SDK for Python to train a model on Vertex AI.
We will cover the following topics:
Before utilizing Vertex AI, you must be aware of its many parts and principles. Here are a few of the important ones:
Installing the Google Cloud-platform package is required to utilize the Vertex AI SDK for Python. This package contains the Vertex AI Python client library in addition to the Vertex AI SDK for Python. A lower-level library that offers more precise control over the Vertex AI API calls is the client library. If necessary, you can utilize both libraries at once.
In your virtual environment, execute the following command to install the google-cloud-platform package:
Optional only if you using vertex ai workbench notebook:
Install the latest version of the Vertex AI client library.
Run the following command in your virtual environment to install the Vertex SDK for Python:
Restart the kernel
After you install the additional packages, you need to restart the notebook kernel so it can find the packages.
Before we dive into the code, we need to set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. Make sure you have the Google Cloud SDK (gcloud) installed and configured with your GCP project.
If you don't know your project ID, you may be able to get your project ID using gcloud.
When you run the above command it will show your project ID it will look like this :
Project ID: qwiklabs-gcp-04-c846c60XXXX
Copy the project ID and set your project ID here as the environment variable.
Create a timestamp for uniqueness:
Create a Cloud Storage bucket:
Replace REGION and BUCKET_NAME as per your project requirement.
Output :
Creating gs://qwiklabs-gcp-06-c846b60794346aip-20210826051667/...
Finally, validate access to your Cloud Storage bucket by examining its contents:
In this step, we'll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.
We need to import the Vertex SDK and initialize it using our project ID and location:
To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:
Output:
INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/1075205415941/locations/us-central1/datasets/1945247175768276992/operations/1110822578768838656
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/1075205415941/locations/us-central1/datasets/1945247175768276992
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/1075205415941/locations/us-central1/datasets/1945247175768276992')
'projects/1075205415941/locations/us-central1/datasets/1945247175768276992'
This will create a dataset from a CSV file stored on your GCS bucket.
Now, we are ready to create and train our AutoML tabular model:
Output:
opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://accounts.google.com/v3/signin/identifier?continue=https%3A%2F%2Fconsole.cloud.google.com%2Fai%2Fplatform%2Flocations%2Fus-central1%2Ftraining%2F1715908841423503360%3Fproject%3D1075205415941&followup=https%3A%2F%2Fconsole.cloud.google.com%2Fai%2Fplatform%2Flocations%2Fus-central1%2Ftraining%2F1715908841423503360%3Fproject%3D1075205415941&ifkv=AdBytiPZlZjePoeTjkngN-yCQlUQlHkp6vD2B21-Z32Sg-mcCtf4ZeS_L7jH6lfUI_QiPSyrMh2ZNg&osid=1&passive=1209600&service=cloudconsole&flowName=WebLiteSignIn&flowEntry=ServiceLogin&dsh=S-855040185%3A1753269019322728
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
It takes more than 2 hours to complete the training.
Before making predictions, we need to deploy the model to an endpoint:
Output:
/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/7965582686603444224
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1075205415941/locations/us-central1/endpoints/7467372802459303936')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/2903536705439006720
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
With the model deployed, you can now make predictions. Here's an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes
Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.
Output:
Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)
In this article, we learned how to use the Vertex AI SDK for Python to train a model on Vertex AI. We covered the following steps:
We also learned about some of the main components and concepts of Vertex AI, such as project, dataset, training job, model, endpoint, and prediction. We used an image classification example to demonstrate how to use the Vertex AI SDK for Python, but you can apply the same steps to other types of datasets and models as well.