Optimize Production with R - Part I
An introduction to linear programming with R
Optimizing production with R
Last year, I was approached by a friend who works in a small, family owned steel and metal business. He wanted to know if it was possible to create something that would help him solve the problem of minimizing waste when cutting steel beams. Sounds like a problem for linear programming!
When I started out, there was not a huge amount of beginners articles on how to use linear programming in R that made sense for somebody not that versed in math. Linear programming with R was also an area where ChatGPT did not shine in early 2023, so I found myself wishing for a guide.
This series is my attempt at making such a guide. Hopefully it will be of use to someone.
Part I of the series will introduce the concept of linear programming in R, with a basic example. In part II, I will teach you how to create a more advanced model. If there is enough demand, I might expand it to Part III, where I go into details on how to create a Shiny app actually using linear programming to optimize work.
Linear programming
Linear programming deals with finding the optimal solution of a linear function. Common examples are the knapsack problem (given a set of items with a value and a weight, figure out which items to put in the knapsack, so that you do not exceed the weight limit while maximing the value of the items you pick) and the travelling salesman problem (given a set of cities and distances between them, find the shortest possible route that visits each city once and returns to the origin city).
The main components of a linear programming problem are:
- The Objective function: The problem you want to solve, expressed as a mathematical equation. For example, maximize the value of the items put in the knapsack.
- The Constraints: The rules or restrictions your decision variables must follow. In the above example of the travelling salesman: visit all cities, but only once each).
- The Decision variables: Variables that impact how you want to achieve the optimal solution. For example, if you wanted to figure out the optimal production of two products, given different prices and available material, the two products would be modelled as two decision variables.
Example: The Knapsack problem
Dirk Schumacher has created an excellent package called OMPR, that makes like easier when developing a mixed integer programming model (or MIP model for short). With OMPR, you build a model by creating a model object, then adding variables to it. Let’s look at the Knapsack problem to illustrate how it works, and then get started on the steel beams.
The first step, is to define our problem (Medium does not support mathematical symbols well, so thus the images):
With this in mind, we can define our decision variables:
and the objective function:
We also need a constraint, i.e. that the total weight should not exceed the capacity of the knapsack:
Converting the problem into code
# Load libraries
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)
# Define the problem data
values <- c(10, 20, 30, 40) # values of the items
weights <- c(1, 3, 4, 5) # weights of the items
capacity <- 7 # weight capacity of the knapsack
n <- length(values) # number of items
# Build the model
model <- MIPModel() |>
add_variable(x[i], i = 1:n, type = "binary") |>
set_objective(sum_over(values[i] * x[i], i = 1:n), "max") |>
add_constraint(sum_over(weights[i] * x[i], i = 1:n) <= capacity)
# Solve the model
result <- solve_model(model, with_ROI(solver = "glpk"))
# Extract the solution
solution <- get_solution(result, x[i])
# Display the results
print(solution)
variable i value
1 x 1 1
2 x 2 0
3 x 3 0
4 x 4 1
The resulting optimal solution is that given a value of 10, 20, 30 and 40, with weights 1, 3, 4 and 5, we maximize the value by selecting item 1 (worth 10) and 4 (worth 40), for a total value of 50. It’s worth noting that items 2 (worth 20) and 3 (worth 30) also provide the same value, but for more weight.
Step 1: Add variables
We start by defining the model object, and then pipe it onwards:
model <- MIPModel() |>
The add_variable() function is used to add decision variables to the model, and in the same function you define the variable.
add_variable(x[i], i = 1:n, type = "binary") |>
In this example, we only have one variable, and we denote it x. In examples, you will come across using both x and y (and more), such as the ones used in the Warehouse Location Problem.
However, I would advise against the practice of using x and y, and x[i, j] to denote the different indices. The reason is that your code will be a lot more readable, and the process of setting up and making sense of the model is a lot easier if the variable names and indices are understandable.
Compare these two chunks of code:
ompr::add_variable(cutSteel[work, inventory],
work = 1:numberOfWorkItems,
inventory = 1:numberOfInventoryItems,
type = "binary"
)
ompr::add_variable(x[i, j],
i = 1:numberOfWorkItems,
j = 1:numberOfInventoryItems,
type = "binary"
)
They are identical in function, and denote a decision variable that defines if steel should be cut from a beam or not. The first is a lot easier to read.
The syntax of the add_variable() function is quite straight forward once you get the hang of it. First you name it, and then you define index sets.
The index sets (or set), are the ranges over which the variable is defined. These indices help define the matrix or vector of decision variables that are used in the model (a matrix if it has two indices, or a one-dimensional array if it has one).
For the example of the Knapsack Problem, we can re-write the code to follow this principle of naming things for humans:
# Load libraries
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)
# Define the problem data
values <- c(10, 20, 30, 40) # values of the items
weights <- c(1, 3, 4, 5) # weights of the items
capacity <- 7 # weight capacity of the knapsack
numberOfItems <- length(values) # number of items
# Build the model
model <- MIPModel() |>
add_variable(
itemsInSack[item],
item = 1:numberOfItems,
type = "binary"
) |>
set_objective(sum_over(values[item] * itemsInSack[item], item = 1:numberOfItems), "max") |>
add_constraint(sum_over(weights[item] * itemsInSack[item], item = 1:numberOfItems) <= capacity)
So, in the first row of the add_variable() function, we define the variable itself, naming it itemsInSack. Then, we define the vector by saying that itemsInSack has different item, defined as 1:numberOfItems, which in this case would be 1:4, creating four (one for each item) binary decision variables. Next, we define the type, which can either be continuous, integer or binary. If you need, you can also use the lb and ub arguments to define lower and upper bounds for the variables. These are set it -Inf and +Inf by default.
Step 2: Add the objective function
Next up, is the objective function, which is written like a mathematical expression. The set_objective() function has two main arguments:
- The expression we want to optimize
- The direction of optimization ("max" for maximization, "min" for minimization)
The sum_over() function is used to help create summations over the indexes. In our example, it will sum the product of values[item] and itemsInSack[item] for each item in the specified range (i.e. 1:4). This give us the value of the items that are in the sack. If an item is in the sack, the value will be the value of that item times 1 (1 means the item is in the sack), or the value of that itmes times 0 (0 means that the item is not in the sack).
The "max" argument let’s the model know that we want to maximize the value of the objective function.
Step 3: Add the constraint
Finally, we define the constraint, or boundary that the model needs to operate within. This is done using the add_constraint() function, which looks very similar to the objective function.
We sum over the weights of each item (since the weight limit is what constrains us), following the same principle as for the objective function. The total sum of all the items that are in the sack, must be equal to or less than the capacity (<= capacity ).
Once you have defined the model, you can inspect it by printing it:
Mixed integer linear optimization problem
Variables:
Continuous: 0
Integer: 0
Binary: 4
Model sense: maximize
Constraints: 1
We have a model with 4 binary variables, 1 constraint and the model is set to maximize the obejective function.
Step 4: Solve the model
Solving this model is quite easy, and does not take long. However, if you have a more complex model or a larger set of data, this can take a while. To solve it and save the solution, we use the following code:
result <- solve_model(model, with_ROI(solver = "glpk"))
There are different solvers available, in this example we use a freely available one ("glpk). There are commercially available solvers as well, that can improve the speed of your calculations.
Step 5: Get the results
We also want to get the result, so that we can see which items to put in our sack!
We do this by using the get_solution() function:
solution <- get_solution(result, itemsInSack[item])
variable item value
1 itemsInSack 1 1
2 itemsInSack 2 0
3 itemsInSack 3 0
4 itemsInSack 4 1
This can be bit hard to read, so we can make it somewhat easier to read by filtering the items that are not in the sack, and adding the actual value of the item:
solution |>
dplyr::filter(value > 0) |>
dplyr::select(-value) |>
dplyr::mutate(
value = values[item]
)
variable item value
1 itemsInSack 1 10
2 itemsInSack 4 40
This is the first article in the series; next up is how to apply what we have learned so far, and create a more advanced model. If you have any questions, feel free to ask them in the comment section.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS