The term "Automated Machine Learning," or "AutoML," refers to a set of tools and methods used to speed up the creation of machine learning models. It automates a variety of processes, including model evaluation, feature selection, hyperparameter tweaking, and data preparation. By automating the intricate and time-consuming processes involved in model creation, AutoML platforms hope to make machine learning accessible to people and businesses without a strong background in data science.
Make sure you have already created a Project and a Bucket to move forward.
Step 1: Dataset
We need to create a dataset, using the create button. Here we have used Titanic.csv. It is mandatory to keep the dataset in the dataset section. First, inside Vertex AI go to Dataset and click on Create.
If you want to do classification and Regression problems or forecasting then go to Tabular -> Classification/Regression or Tabular -> Forecasting.
Upload the dataset from either local computer (chargeable), or you can directly use BigQuery if your dataset is stored in BigQuery itself.
After successful completion, you can click to Analyze the dataset. It will automatically analyze the unique and the missing values of the dataset in a click.
Please Note: Auto ML doesnβt work great in complex dataset. For that we need a clean pre processed data. All the preprocessing must be done before uploading the dataset in Vertex AI.
Select the region where you want to store the training.
Join Featurestore is optional, mean we can skip also.
Feature store are used for reusability so for this time we are not adding any feature from feature store.
4. Training options
In training options we can exclude any unnecessary independent variables If required, just click on minus.
Click on Advanced Option. There is a weight column where we can give priority to some feature if required. Use the optimization techniques as required. Here, I used Log loss.
An endpoint refers to an API (Application Programming Interface) that allows you to interact with your deployed machine learning model. It provides a way for external applications, services, or users to send data to the model for inference (making predictions or classifications) and receive the model's responses.
Select Traffic Split, remember select it out of 100. (Details in next slide)
Min no of compute nodes is the compute resources will continuously run even without traffic demand. This can increase cost but avoid dropped requests due to node initialization.
What is Traffic split in Vertex AI during Model deployment?
Traffic split refers to the distribution of inference requests (also known as traffic) across different versions of a deployed machine learning model. When you deploy multiple versions of a model, you can control how much traffic each version receives. For example, you might direct 80% of the traffic to the current production version and 20% to a new experimental version. So, in short if you want to deploy multiple versions, make the traffic distribution in that respective way.
Explainability options are particularly important when dealing with complex models, such as deep neural networks, that might be considered "black-box" models due to their intricate internal workings. Explainability helps data scientists, developers, and stakeholders gain confidence in the model's decisions. It is recommended to turn it on so that we can test it after deployment.
Once the model is in production, it requires continuous monitoring for ensuring its performance is as we expected. It will send email report to the given email id in a gap of x days.
Click on Deploy.
After successful deployment, we can test the model.
To test the model provide the inputs. See the result of the output.