Unlock the Secret to Efficient Batch Prediction Pipelines Using Python, a Feature Store and GCS

Lesson 3: Batch Prediction Pipeline. Package Python Modules with Poetry.

May 12, 2023

19 min read

THE FULL STACK 7-STEPS MLOPS FRAMEWORK

This tutorial represents lesson 3 out of a 7-lesson course that will walk you step-by-step through how to design, implement, and deploy an ML system using MLOps good practices. During the course, you will build a production-ready model to forecast energy consumption levels for the next 24 hours across multiple consumer types from Denmark.

By the end of this course, you will understand all the fundamentals of designing, coding and deploying an ML system using a batch-serving architecture.

This course targets mid/advanced machine learning engineers who want to level up their skills by building their own end-to-end projects.

Nowadays, certificates are everywhere. Building advanced end-to-end projects that you can later show off is the best way to get recognition as a professional engineer.

Course Introduction

At the end of this 7 lessons course, you will know how to:

design a batch-serving architecture
use Hopsworks as a feature store
design a feature engineering pipeline that reads data from an API
build a training pipeline with hyper-parameter tunning
use W&B as an ML Platform to track your experiments, models, and metadata
implement a batch prediction pipeline
use Poetry to build your own Python packages
deploy your own private PyPi server
orchestrate everything with Airflow
use the predictions to code a web app using FastAPI and Streamlit
use Docker to containerize your code
use Great Expectations to ensure data validation and integrity
monitor the performance of the predictions over time
deploy everything to GCP
build a CI/CD pipeline using GitHub Actions

If that sounds like a lot, don’t worry. After you cover this course, you will understand everything I said before. Most importantly, you will know WHY I used all these tools and how they work together as a system.

If you want to get the most out of this course, I suggest you access the GitHub repository containing all the lessons’ code. This course is designed to read and replicate the code along the articles quickly.

By the end of the course, you will know how to implement the diagram below. Don’t worry if something doesn’t make sense to you. I will explain everything in detail.

👁 Diagram of the architecture you will build during the course [Image by the Author].

Diagram of the architecture you will build during the course [Image by the Author].

By the end of Lesson 3, you will know how to implement and integrate the batch prediction pipeline and package all the Python modules using Poetry.

Course Lessons:

If you want to grasp this lesson fully, we recommend you check out our previous lesson, which talks about designing a training pipeline that uses a feature store and an ML platform:

A Guide to Building Effective Training Pipelines for Maximum Results

Data Source

We used a free & open API that provides hourly energy consumption values for all the energy consumer types within Denmark [1].

They provide an intuitive interface where you can easily query and visualize the data. You can access the data here [1].

The data has 4 main attributes:

Hour UTC: the UTC datetime when the data point was observed.
Price Area: Denmark is divided into two price areas: DK1 and DK2 – divided by the Great Belt. DK1 is west of the Great Belt, and DK2 is east of the Great Belt.
Consumer Type: The consumer type is the Industry Code DE35, owned and maintained by Danish Energy.
Total Consumption: Total electricity consumption in kWh

Note: The observations have a lag of 15 days! But for our demo use case, that is not a problem, as we can simulate the same steps as it would in real-time.

👁 A screenshot from our web app showing how we forecasted the energy consumption for area = 1 and consumer_type = 212 [Image by the Author].

A screenshot from our web app showing how we forecasted the energy consumption for area = 1 and consumer_type = 212 [Image by the Author].

The data points have an hourly resolution. For example: "2023–04–15 21:00Z", "2023–04–15 20:00Z", "2023–04–15 19:00Z", etc.

We will model the data as multiple time series. Each unique price area and consumer type tuple represents its unique time series.

Thus, we will build a model that independently forecasts the energy consumption for the next 24 hours for every time series.

Check out the video below to better understand what the data looks like 👇

Lesson 3: Batch Prediction Pipeline. Package Python Modules with Poetry.

The Goal of Lesson 3

This lesson will teach you how to build the batch prediction pipeline. Also, it will show you how to package into Python PyPi modules, using Poetry, all the code from the pipelines we have done so far in Lessons 1, 2, and 3. 👇

Note: In the next lesson, we will upload these Python modules into our own private PyPi server and install them from Airflow.

👁 Diagram of the final architecture with the Lesson 3 components highlighted in blue [Image by the Author].

Diagram of the final architecture with the Lesson 3 components highlighted in blue [Image by the Author].

If you recall from Lesson 1, a model can be deployed in the following ways:

batch mode
request-response (e.g., RESTful API or gRPC)
streaming mode
embedded

This course will deploy the model in batch mode.

We will discuss strategies to transition from batch to other methods when building the web app. You will see how natural it is.

But, if you are eager to compare the batch mode with a request-response serving mode, check out my 5-minute article that explains how to serve a model using the request-response methodology.

What are the main steps of deploying a model in batch mode, aka building a batch prediction pipeline?

Step 1: You will load the features from the feature store in batch mode.

Step 2: You will load the trained model from the model registry (in our case, we use Hopsworks as a model registry).

Step 3: You will forecast the energy consumption levels for the next 24 hours.

Step 4: You will save the predictions in a GCP bucket.

After, various consumers will read the predictions from the GCP bucket and use them accordingly. In our case, we implemented a dashboard using FastAPI and Streamlit.

Often, your initial deployment strategy will be in batch mode.

Why?

Because doing so, you don’t have to focus on restrictions such as latency and throughput. By saving your predictions into some storage, you can quickly make your model online.

Thus, batch mode is the easiest and fastest way of deploying your model while preserving a good experience for the end user of the applications.

A model is online when an application can access the predictions in real-time.

Note that the predictions are not made in real-time, only accessed in real-time (e.g., read from the storage).

The biggest downside of using this method is that your predictions will have a degree of lag. For example, in our use case, you make and save the predictions for the next 24 hours. Let’s assume that 2 hours pass without any new predictions. Now, you have predictions only for the next 22 hours.

Where the number of predictions that you have to store is reasonable, you can bypass this issue by making the predictions often. In our example, we will make the predictions hourly – our data has a resolution of 1 hour. Thus, we solved the lag issue by constantly making and storing new predictions.

But here comes the second problem with the batch prediction strategy. Suppose the set of predictions is large. For example, you want to predict the recommendations for 1 million users with a database of 100 million items. Then, computing the predictions very often will be highly costly.

Then you have to consider using other serving methods strongly.

But here is the catch.

Your application probably won’t start with a database of 1 million users and 100 million items. That means you can safely begin using a batch mode architecture and gradually shift to other methodologies when it makes sense.

That is what most people do!

To get an intuition on how to shift to other methods, check out this article to learn about a standardized ML architecture suggested by Google Cloud.

Theoretical Concepts & Tools

GCS: GCS stands for Google Cloud Storage, which is Google’s storage solution within GCP. It is similar to AWS S3 if you are more familiar with it.

You can write to GCS any file. In our course, we will write Pandas DataFrames as parquet files.

GCS vs. Redis: We choose to write our predictions in GCS because of 4 main reasons:

Easy to setup
No maintenance
Access to the free tier
We will also use GCP to deploy the code.

Redis is a popular choice for caching your predictions to be later accessed by various clients.

Why?

Because you can access the data at low latency, improving the users’ experience.

It would have been a good choice, but we wanted to simplify things.

Also, it is good practice to write the predictions on GCS for long-term storage and cache them in Redis for real-time access.

Poetry: Poetry is my favorite Python virtual environment manager. It is similar to Conda, venv, and Pipenv. In my opinion, it is superior because:

It offers you a .lock file that reflects the versions of all your sub-dependencies. Thus, replicating code is extremely easy and safe.
You can quickly build your module directly using Poetry. No other setup is required.
You can quickly deploy your module to a PiPy server using Poetry. No other setup is required, and more…

Lesson 3: Code

You can access the GitHub repository here.

Note: All the installation instructions are in the READMEs of the repository. Here we will jump straight to the code.

All the code within Lesson 3 is located under the batch-prediction-pipeline folder.

The files under the batch-prediction-pipeline folder **** are structured as follows:

👁 A screenshot that shows the structure of the batch-prediction-pipeline folder [Image by the Author].

A screenshot that shows the structure of the batch-prediction-pipeline folder [Image by the Author].

All the code is located under the batch_prediction_pipeline directory (note the "_" instead of "-").

Directly storing credentials in your git repository is a huge security risk. That is why you will inject sensitive information using a .env file.

The .env.default is an example of all the variables you must configure. It is also helpful to store default values for attributes that are not sensitive (e.g., project name).

👁 A screenshot of the .env.default file [Image by the Author].

A screenshot of the .env.default file [Image by the Author].

Prepare Credentials

First of all, you have to create a .env file **** where you will add all our credentials.

I already showed you in [Lesson 1](https://towardsdatascience.com/a-framework-for-building-a-production-ready-feature-engineering-pipeline-f0b29609b20f) how to set up your .env file. Also, I explained in Lesson 1 how the variables from the .env file are loaded from your ML_PIPELINE_ROOT_DIR directory into a SETTINGS Python dictionary to be used throughout your code.

Thus, if you want to replicate what I have done, I strongly recommend checking out Lesson 1.

If you only want a light read, you can completely skip the "Prepare Credentials" step.

In Lesson 3, you will use two services:

Hopsworks (free)

We already showed you in Lesson 1 how to set up the credentials for Hopsworks. Please visit the "Prepare Credentials" section from Lesson 1, where we showed you in detail how to set up the API KEY for Hopsworks.

GCP – Cloud Storage (free)

While replicating this course, you will stick to the GCP – Cloud Storage free tier. You can store up to 5GB for free in GCP – Cloud Storage, which is far more than enough for our use case.

This configuration step will be longer, but I promise that it is not complicated. By the way, you will learn the basics of using a cloud vendor such as GCP.

First, go to GCP and create a project called "energy_consumption" (or any other name). Afterward, go to your GCP project’s "Cloud Storage" section and create a non-public bucket called "hourly-batch-predictions". Pick any region, but just be aware of it—official docs about creating a bucket on GCP [2].

NOTE: You might need to pick different names due to constant changes to the platform’s rules. That is not an issue, just call them as you wish and change them in the .env file: _GOOGLE_CLOUDPROJECT (ours "energy_consumption") and _GOOGLE_CLOUD_BUCKET_NAME (ours "hourly-batch-predictions")_.

👁 Screenshot of the GCP - Cloud Storage view, where you must create your bucket [Image by the Author].

Screenshot of the GCP – Cloud Storage view, where you must create your bucket [Image by the Author].

Now you finished creating all your GCP resources. The last step is to create a way to have read & write access to the GCP bucket directly from your Python code.

You can easily do this using GCP service accounts. I don’t want to hijack the whole article with GCP configurations. Thus, this GCP official doc shows you how to create a service account [3].

When creating the service account, be aware of one thing!

Service accounts have attached different roles. A role is a way to configure your service account with various permissions.

Thus, you need to configure your service account to have read & write access the your "hourly-batch-predictions" bucket.

You can easily do that by choosing the "Storage Object Admin" role when creating your service account.

The final step is to find a way to authenticate with your newly created service account in your Python code.

You can easily do that by going to your service account and creating a JSON key. Again, here are the official GCP docs that will show you how to create a JSON key for your service account [4].

Again, keep in mind one thing!

When creating the JSON key, you will download a JSON file.

After you download your JSON file, put it in a safe place and go to your .env file. There, change the value of _GOOGLE_CLOUD_SERVICE_ACCOUNT_JSONPATH **** with your absolute path to the JSON file.

👁 A screenshot of the .env.default file [Image by the Author].

A screenshot of the .env.default file [Image by the Author].

NOTE: Remember to change the _GOOGLE_CLOUDPROJECT and _GOOGLE_CLOUD_BUCKETNAME variables with your names.

Congratulations! You are done configuring GCS – Cloud Storage.

Now you have created a GCP project and bucket. Also, you have read & write access using your Python code through your service account. You log in with your service account with the help of the JSON file.

If something isn’t working, let me know in the comments below or directly on LinkedIn.

Batch Prediction Pipeline – Main Function

As you can see, the main function follows the 4 steps of a batch prediction pipeline:

Loads data from the Feature Store in batch mode.
Loads the model from the model registry.
Makes the predictions.
It saves the predictions to the GCS bucket.

Most of the function is log lines 😆

Along these 4 main steps, you must load all the parameters from the metadata generated by previous steps, such as the feature_view_version and model_version.

Also, you have to get a reference to the Hopsworks Feature store.

After, you go straight to the 4 main steps that we will detail later in the tutorial 👇

Step 1: Loading Data From the Feature Store In Batch Mode

This step is similar to what we have done in Lesson 2 when loading data for training.

But this time, instead of downloading the data from a training dataset, we directly ask for a batch of data between a datetime range, using the get_batch_data() method.

Doing so allows us to time travel to our desired datetime range and ask for the features we need. This method makes batch inference extremely easy.

The last step is to prepare the indexes of the DataFrame as expected by sktime and to split it between X and y.

Note: This is an autoregressive process: we learn from past values of y to predict future values of y ( y = energy consumption levels). Thus, we will use only X as input to the model. We will use y only for visualization purposes.

Step 2: Loading the Model From the Model Registry

Loading a model from the Hopsworks model registry is extremely easy.

The function below has as a parameter a reference to the Hopsworks project and the version of the model we want to download.

Using these two variables, you get a reference to the model registry. Afterward, you get a reference to the model itself using its name. In this case, it is best_model.

Finally, you download the artifact/model and load it into memory.

The trick here is that your model is versioned. Thus, you always know what model you are using.

Note: We uploaded the best_model in the model registry using the training pipeline explained in Lesson 2. The training pipeline also provides us with a metadata dictionary that contains the latest model_version.

Step 3: Forecast Energy Consumption Levels for the Next 24 Hours

Sktime makes forecasting extremely easy. The key line from the snippet below is "predictions = model.predict(X=X_forecast)", which forecasts the energy consumption values for the next 24 hours.

The forecasting horizon of 24 hours was given when the model was trained. Thus, it already knows how many data points into the future to forecast.

Also, you have to prepare the exogenous variable X_forecast. In time series forecasting, an exogenous variable is a feature that you already know it will happen in the future. For example, a holiday. Thus, based on your training data X which contains all the area and consumer types IDs, you can generate the X_forecast variable by mapping the datetime range into the forecasting range.

Step 4: Save the Predictions to the Bucket

The last component is the function that saves everything to the GCP bucket.

This step is relatively straightforward, and the hard part was to configure your bucket and access credentials.

We get a reference to the bucket, iterate through X, y & predictions and write them to the bucket as a blob.

Note: Besides the predictions, we also save X and y to have everything in one place to quickly access everything we need and nicely render them in the web app.

To get a reference to the bucket, you have to access the settings you configured at the beginning of the tutorial.

As you can see, you create a GCS client with the project name and the JSON credentials file path. Afterward, you can quickly get a reference to your given bucket.

Writing a blob to a bucket is highly similar to writing a regular file.

You get a reference to the blob you want to write and open the resource with "with blob.open("wb") as f".

Note that you opened the blob in binary format.

You are writing the data in parquet format, as it is an excellent trade-off between storage size and writing & reading performance.

Package Python Modules with Poetry

Poetry makes the building process extremely easy.

The first obvious step is to use Poetry as your virtual environment manager. That means you already have the "pyproject.toml" and "poetry.lock" files – we already provided these files for you.

Now, all you have to do is to go to your project at the same level as your Poetry files (the ones mentioned above – for example, go to your batch-prediction-pipeline directory) and run:

poetry build

This will create a dist folder containing your package as a wheel. Now you can directly install your package using the wheel file or deploy it to a PyPi server.

To deploy it, configure your PyPi server credentials with the following:

poetry config repositories.<my-pypi-server> <pypi server URL>
poetry config http-basic.<my-pypi-server> <username> <password>

Finally, deploy it using the following:

poetry publish -r <my-pypi-server>

And that was it. I was amazed at how easy Poetry can make this process.

Otherwise, building and deploying your Python package is a tedious and lengthy process.

In Lesson 4, you will deploy your private PyPi server and deploy all the code you have written until this point using the commands I showed you above.

Conclusion

Congratulations! You finished the third lesson from the Full Stack 7-Steps MLOps Framework course.

If you have reached this far, you know how to:

choose the right architecture
access data from the feature store in batch mode
download your model from the model registry
build an inference pipeline
save your predictions to GCS

Now that you understand the power of using and implementing a batch prediction architecture, you can quickly serve models in real-time while paving your way for other fancier serving methods.

Check out Lesson 4 to learn about hosting your own private PyPi server and orchestrating all the pipelines using Airflow.

Also, you can access the GitHub repository here.

💡 My goal is to help machine learning engineers level up in designing and productionizing ML systems. Follow me on LinkedIn or subscribe to my weekly newsletter for more insights!

🔥 If you enjoy reading articles like this and wish to support my writing, consider becoming a Medium member. By using my referral link, you can support me without any extra cost while enjoying limitless access to Medium’s rich collection of stories.

Join Medium with my referral link – Paul Iusztin

References

[1] Energy Consumption per DE35 Industry Code from Denmark API, Denmark Energy Data Service

[2] Create buckets, GCP Cloud Storage Docs

[3] Create service accounts, GCP IAM Docs

[4] Create and delete service account keys, GCP IAM Docs

Written By

Paul Iusztin

See all from Paul Iusztin

Full Stack Mlops, Hands On Tutorials, Machine Learning, Mlops, Pipeline

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/unlock-the-secret-to-efficient-batch-prediction-pipelines-using-python-a-feature-store-and-gcs-17a1462ca489/