Why Every Data Scientist Should Learn Mathematical Optimization
Data science courses are focusing on data visualization, feature engineering, data processing, (un)supervised learning, clustering…
Data science courses are focusing on data visualization, feature engineering, data processing, (un)supervised learning, clustering, programming, deep learning and sometimes data engineering. Optimization isn’t part of these courses, mostly it’s not even mentioned! And that’s a shame, because it can add value in many business processes. Pros are its flexibility, easiness and fastness. It has capabilities machine learning can’t accomplish. With optimization you make decisions while with machine learning you make predictions. Curious? Continue reading!
The Four Types of Data Analysis
If you are familiar with the four types of data analysis you can skip this section.
In the image below you can find the four types of analysis.
-
Descriptive analysis This type focuses on summarizing data from the past. It is widely used to track KPIs. Data is visualized in dashboards or reports and updated continuously, daily, weekly or monthly. This is the easiest type of analysis. You extract data from a database and you can start visualizing.
- Diagnostic analysisTo dig deeper and to find out why things happened, diagnostic analysis comes in. This type of analysis takes the insights found from descriptive analytics and drills down to find the causes of those outcomes. An example is a root cause analysis.
-
Predictive analysis Predictive analysis wants to tell you something about the future and predicts what will likely happen. This is done using forecasting or machine learning techniques.
- Prescriptive analysisRecommendations about ‘the best next thing to do’ falls under prescriptive analysis. Determining the course of action to take in the current situation can be hard, but this is why prescriptive analysis has the potential to add most value to a business. It’s possible to use AI or mathematical optimization here (besides other techniques).
Optimization
Mathematical optimization falls in the prescriptive analysis section and this makes it a really valuable technique. It is widely used in areas like energy, healthcare, logistics, sports scheduling, finance, manufacturing and retail. You can optimize the routing of packages, choose the most cost-effective way to deliver electricity, create a working schedule or divide tasks in an honest way.
But what exactly is mathematical optimization? And how does it work? It all starts with a business problem. Imagine you are part of a delivery company and you discover packages arrive too late at customers. You receive complaints and you start analyzing. Something must be wrong with your delivery process. You find out that every deliverer just grabs a random amount of packages and delivers those. After the delivery of one package, the deliverer uses Google Maps to find out how to get to the next address. Wow, so many optimization possibilities! You start thinking: what if the delivery vans are filled completely, with packages that are near each other, and the deliverers follow the shortest route possible? This would have a huge effect on the delivery process! The delivery time would improve, this will result in less complaints and more happy customers! The deliverers can deliver more packages in a shorter amount of time and the vans are using less fuel. Only wins here! 🎉
Finding the optimal routes for the deliverers, choosing packages near each other and filling the vans are all examples that can be solved using optimization. To solve these kind of problems, you should take the following steps:
You should start with understanding the problem. This involves defining the problem, setting boundaries, talking to stakeholders and finding out what value you want to minimize or maximize.
The next (and often hardest) step is modeling the problem. You should translate everything you discovered during the first step to math. This means defining the variables, constraints and objective. You can think of the variables as the values you can influence. For example, if you want to select an x amount of packages near by each other, every package will receive a group number. The group numbers related to the packages could be the variables. The constraints are the limits you want to use in your model. Let’s say a van can hold a maximum of 600 kilos of packages, this is an example of a constraint, the total weight of the selected packages for a van should not exceed 600 kilos. Last but not least, the objective is the formula you want to maximize or minimize. If we are talking about a routing problem, you can imagine you want to minimize the total number of miles traveled. After you modeled your problem using math, you can continue to the next step.
Solving the problem isn’t hard if you did the previous step correctly and know how to code. For the solving step you need a framework and a solver. Some example frameworks are pyomo, ortools, pulp or scipy. Examples of free solvers are cbc, glpk and ipopt. There are also commercial solvers available, they are a lot faster and if you want to solve problems with many variables and constraints you should use a commercial solver. You code your problem using Python for example, you call the solver and wait for the results. Now you can continue to the last step.
You can analyze the results the solver came up with to discover the improvement in performance. You can compare these results to the current processes and see if it’s worth to put the model in production to optimize your process every once in a while.
If you are curious to see these steps in easy real life examples, you can find them here. This article discusses more advanced concepts.
Conclusion
Optimization is so powerful every data scientist should be able to implement it, or at least be familiar with its possibilities to solve complex business problems. It’s an addition to machine learning, because you can make decisions instead of predictions. You don’t need labeled data. Also it’s not necessary to retrain your model when data distributions change.
Thanks for reading and enjoy optimizing! ❤
If you want to start learning the basics of optimization besides the articles mentioned above, I can recommend this Udemy course and the OR tools examples.
Related
Don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS