Model Rollbacks Through Versioning

The Walmart Rollback isn't the only kind that can save you money

Jan 16, 2023

9 min read

Using Model Rollbacks Is Fun!

There’s general consensus in the Machine Learning community that models can and have made biased decisions against traditionally marginalized groups. Ethical AI researchers from Dr. Cathy O’Neil to Dr. Joy Buolamwini have gone to great lengths to establish a pattern of faulty decision making rooted in biased and unrepresentative data that result in serious harms. Unfortunately, our "intelligent" learning algorithms are only as smart, capable and ethical as we make them and we are only at the beginning of understanding the long term effects of biased models. Fortunately, there are many strategies already at our disposal that we can use to mitigate harms when they arise. Today, we will focus on a very powerful strategy: Model Rollbacks through Versioning.

When the average ML or AI practitioner has made models in the past, the priority of the model builder looked a lot like this:

👁 Traditional Data Workflow

Traditional Data Workflow

You collect the Data from a cloud source, save the Data in a database, and then code a performant model sending the most well performing one off to an Operations or Engineering team to integrate into their larger codebase and deploy in their Web Applications.

This framework—while streamlined and tidy—failed in many ways:

Model performance is judged by their scores or the KPIs pushed by the company to judge model success—not so much by the long term affect of the people using the model’s output.
The model builders are detached from the model integration process and the people deploying the model know very little about how the model makes decisions. This leads to a lack of transparency into the model building process that becomes exasperated down the line as Engineers that integrated the model are unable to detect if the deployed model is working as intended or potentially perpetuating harms.
Even if the model builders took the time to find representative Data and had a properly performing model that did no harms during the training process, being detached from the model deployment process means they little no visibility into the decisions that model makes in the face of new, and at time potentially biased, Data being ingested "in the wild".

The recognition of these problems and more make the ethical case for a new type of model builder—one that recognized the value of building performant models while understanding the unique opportunity learning about model integration could bring to improve model performance post deployment while simultaneously reducing bias. These impeccable minds can be found on MLOps, AI, and Analytical Engineering teams all across Tech. Instead of working on models as above, their process is extended to include this:

👁 MLOps Process

MLOps Process

After data is ingested from a database, multiple models are created to solve a problem at the company and these models are containerized through a service such as Docker. An API is created that is pointed to the port host where the models are located where it can provide output to a Web Application for user interaction. All outputs are ingested into the cloud through a service such as MongoDB for further analysis and a monitoring system such as Grafana is attached to model output to provide alerts if the model is unable to generalize well to the world. This process allows for better visualization into how a model is performing post deployment and streamlines Model Versioning techniques easily.

Model Versioning: What is It?

Model Versioning is a workflow that allows for Engineers to track software changes over time and is a Model-centric take on Version Control. When we include model builders into the model integration process we are able to track the models we use and adapt accordingly based on how they perform when integrated into a larger system.

The model building process is iterative—one that requires multiple changes across time. Every step of the model building process carries with it the possibility that what is currently being used to make the model performant may be switched, changed, or adjusted to improve performance down the line. Model Versioning will allow for a step by step snapshot of changes happening at every step of the process and saving prior iterations or versions of each model’s changes for the possibility of future use.

Here’s what this may look like:

Say you originally ingest data with thousands of features across a population, but after doing some feature engineering, you reduce your dataset down to the most "important" features across the population after running some feature importance algorithms. Model Versioning can be done on your Data to track the evolution of features chosen to be inside your model.
When training multiple models, you chose different parameters to tuning during the hyperparameter tuning process. Versioning can track the many versions of models you try during the model building process.
When integrating the model into a larger system, you chose model 1 to deploy to the public over model 2 or 3. While model 1 is most performant during the training process, it fails to make appropriate decisions when integrated into the larger system. Model Versioning allows you to continue working on the model while its deployed and push forward a better model version or switch models chosen or even revert back to a prior model version without any disruption to your Web Application.

This is where Model Versioning allows for Model Rollbacks to come in.

Model Rollbacks: Another Great Way to Save

Imagine you are working as a Machine Learning Manager and are given a task for your team to create a Machine Learning model that can make loan decisions based on people’s credit scores. In the status quo, there is a simple technical solution that has a threshold credit score under which customers are automatically denied and above which the credit application is sent to a credit risk analyst who makes the final decision. This process, while much better than an older system in which everyone had their applications read by a credit risk analyst before they made a decision, has problems.

There are far too many applications going to the credit risk analysts for them to be able to make decisions in a timely manner.
There are people with no credit being denied that may be a good candidate for a loan, but credit risk analysts aren’t even able to get to them because of the threshold technical solution.
There is a gendered and racial component where white men are more likely than anyone else to have their credit make it past the technical solution and everyone else is at greater risk of being auto denied because of credit alone.
There are a lot of people a few points off of the threshold number that are being lumped in to the people who have much lower credit scores and would present a much higher risk.
These problems are costing your company a lot of revenue.

Your team gets to work creating a model that can address these concerns. You start with making sure your team finds what you consider to be representative Data. Once that is done, you have them create multiple models—one a heuristic, one a logistic regression model, and the last a random forest model—to see which one is most performant. You then hand off the model to a different team to integrate into the larger codebase. The company you are in exist in silos so you are unable to have any visibility into how the model is deployed nor do you care to know. After 6 months, the model appears to be going well – that is until an op-ed is released with the following headline "Credit Company’s Algorithm Discriminates Against Non-Binary and Non College Degree Holders". Your boss calls you and tells you that the model will be taken off line immediately. When pushing your model forward, the simple technical solution was deprecated so it will take some time for it to go live again. In the meantime, all applications will go to the credit risk analysts.

This happens way more than necessary in the Data industry. Integrated teams that work with both the model builders and engineers or a brand new team that specifically uses model builders to integrate models into the codebase and use the power of Model Versioning would save a lot of this headache.

Here’s how:

During the Data collection process, versions of the dataset could be saved and tagged for transparency as well as opportunities for the model builders to reconsider in the future if needed.
During model building, all versions of every model can be saved for model builders to reconsider different hyperparameter techniques if the deployed model begins making biased decisions when deployed.
Model Versioning during deployment would allow for the simple technical solution to stay up and running as the new model is deployed to the public. In the event something goes wrong, the model can be rolled back and the simple technical solution can be pushed forward as the Machine Learning team works to fix the problem.
While fixing the problem with the model, the model builders would have greater visibility into what step of the process may have lead to this outcome. This can reduce the time it takes to improve the problem and save the company time, resources, and money in the long run.

Model Versioning allows for Model rollbacks that can save your company money long term, but more importantly, help reduce bias if and when it arises. However, this technique works best when you have a team of people that understand not only how to build a model, but how to optimize the model in production. To have this, you have to expand the visibility of your model building team into the model integration process by either collaboration between model builders and engineers or by creating a hybrid team of MLOps, AI, or Analytical Engineers.

Any thoughts? Share them in the comments below!

All images created by the author.

URL: https://towardsdatascience.com/model-rollbacks-through-versioning-7cdca954e1cc/