## Setting a starting solution in MIP Models: a scheduling application

In computer science, heuristics are techniques used to find a feasible solution to a given problem, typically faster than exact methods but without a guarantee of optimality. On the other hand, exact methods are much more expensive computationally, but the optimal solution is guaranteed.

Modeling a problem as a Mixed Integer Program (*MIP*) and solving it using a solver may give you the optimal solution. Usually, those solvers use a branch-and-bound algorithm behind the scenes. Branch and Bound (*B&B*) is considered an exact method to solve optimization problems.

As the name suggests, it enumerates the solutions as a tree (branching). The main difference between *B&B* and exhaustive search, often called brute force, is the bounding phase. During the process, every node (solution) is compared against the solution’s lower and upper bound. If the branch is proven not to give better results than the best already found, it is pruned and the algorithm goes to another branch.

The target of this article is not to give too many details on the Branch and Bound algorithm; a much deeper explanation is found here. But we have to define a couple of things:

*Incumbent*: best feasible solution found at each step of the algorithm. If it is a minimization problem, it is an upper bound.*Best Bound*: valid lower bound of the solution.*Gap (%)*: difference between Best Bound and Incumbent. If incumbent is equal to the best bound, the gap is 0, and optimal solution was found.

State-of-the-art solvers have different methods to calculate an incumbent and best bound for any *MIP*. But for the solver, our *MIP *is just a set of equations and objective function. Since we know the exact structure of our problem, wouldn’t it be helpful to develop an Adhoc algorithm and provide the result as an incumbent itself? Yes, and that’s when heuristics and warm-start comes into the picture

The permutation flow shop scheduling problem is one of the most classic optimization problems in the literature. It can be briefly described as follows: given a set of machines *m* and a set of jobs *n*, how to process those jobs such that every job has to go through all the machines in the same order. Objective is to minimize the completion time of last job.

Let’s suppose there are *3* machines (*M1*, *M2 *and *M3*) and *3* jobs (*J1*, *J2 *and *J3*). Jobs have to follow that order given. A possible feasible solution is shown below in form of Gantt Chart:

The completion time, or Makespan, of this schedule is *34 *and the solution/order is *J1*, *J2*, *J3*.

This problem can be formulated as *MIP *model. Let’s call *m* the set of machines and *n* the set of jobs. The jobs have to go through the same order of machines *{0,1,…,m}*. The processing time of job *j* in machine *i* is given by p_ij. Let’s create those parameters in python. We’ll start with a set of *15* jobs and *5* machines. Processing time is generated randomly between *1* and *100*. *M* is an upper bound of the solution. A good upper bound is the sum of all processing times of all jobs in all machines.

`import random`

import numpy as nprandom.seed(10)

# Number of jobs

n = 15

# Number of machines

m = 5

# Max time for random input generator

max_time = 100

# Generate processing times randomly

times = np.zeros((m, n))

tot_time_job = []

for i in range(m):

for j in range(n):

times[i][j] = random.randint(1, max_time)

# Total processing time per job - sum across machines

tot_processing_job = np.sum(times, axis=0)

# Upper bound of the solution - sum of transit matrix

M = sum(times[j][i] for i in range(n) for j in range(m))

There are two sets of variables. Variable *x_ij*, a continuous variable, which is the starting time of job *j* in machine *i*. And variable *y_jk*, a binary variable that is equal to *1* if job *j* is executed before *k*. *0* otherwise. *Cmax *is the makespan of the schedule. Now let’s define that using *gurobipy*

`opt_model = grb.Model(name="Flow shop scheduling")`# Start time of job j at in machine i

x = {(j, i): opt_model.addVar(vtype=grb.GRB.CONTINUOUS, lb=0, name="x_{0}_{1}".format(j, i))

for j in range(n) for i in range(m)}

# 1 if job j is executed before job k. 0 otherwise

y = {(j, k): opt_model.addVar(vtype=grb.GRB.BINARY, name="y_{0}_{1}".format(j, k))

for j in range(n) for k in range(n) if j != k}

# Makespan - Completion time of last job in last machine

c = opt_model.addVar(vtype=grb.GRB.CONTINUOUS, lb=0, name="c")

Constraints are shown below:

** (1)** ensures that processing of job

*j*in machine

*i*can start only when it is finished in machine

*i-1*.

**and**

*(2)***are the disjunction constraints — if job**

*(3)**j*is executed after job

*k*, it should start after its completion, in the given machine.

**defines the makespan, which is the completion time of last job in machine**

*(4)**m*(last machine).

`# Job j in machine i can start only when it is finished in machine i-1`

c1 = {(j, i): opt_model.addConstr(x[j, i] - x[j, i - 1] >= times[i - 1][j],

name="c1_{0}_{1}".format(j, i))

for j in range(n) for i in range(1, m)}# Disjunctive constraints - if job j is after k, it should start after its completion

c2 = {(j, k, i): opt_model.addConstr(x[j, i] - x[k, i] + M * y[j, k] >= times[i][k],

name="c2_{0}_{1}_{2}".format(j, k, i))

for j in range(n) for k in range(n) for i in range(m) if k != j}

c3 = {(j, k, i): opt_model.addConstr(-x[j, i] + x[k, i] - M * y[j, k] >= times[i][j] - M,

name="c3_{0}_{1}_{2}".format(j, k, i))

for j in range(n) for k in range(n) for i in range(m) if k != j}

# Makespan is the completion time of last job in last machine

c4 = {j: opt_model.addConstr(c >= x[j, m - 1] + times[m - 1][j],

name="c4_{0}".format(j))

for j in range(n)}

Objective function is minimize makespan:

`opt_model.ModelSense = grb.GRB.MINIMIZE`

opt_model.setObjective(c)

opt_model.setParam('MIPGap', 0.018)

opt_model.optimize()

If we set the *MIP *Gap to be *1.8%*, the best objective is *832. *The convergence of the Branch-and-Bound algorithm is shown in the chart below:

Incumbent starts at around *1,200 *(the first solution gurobi was able to find) and converges until *832*. What if we could find a better initial solution for the problem and use it as the incumbent?

The *NEH *heuristic is one of the most known heuristics for the flow shop scheduling problem. It is named *NEH *due to its authors (Nawaz, Enscore, and Ham). It is a constructive heuristic, therefore, it starts from an empty solution and constructs the schedule iteratively until all the jobs are assigned.

A solution representation of the permutation flow shop problem is a list of *n* elements. The list is ordered by start time — the first job in the list is the first to be executed and the last job is the latest. The first step is to create a function that receives the ordered list (a solution) as input and returns the makespan associated to it.

`def get_makespan(solution):`

''' Calculate the makespan of a sequence of integer (jobs).

- A job can start only after the previous operation of the same job in machine j-1 ends and

machine is not processing any other job

- Finish time of a job in a given machine is its start time plus processing time in current machine

'''

end_time = np.zeros((m, len(solution) + 1))

for j in range(1, len(solution) + 1):

end_time[0][j] = end_time[0][j - 1] + times[0][solution[j - 1]]

for i in range(1, m):

for j in range(1, len(solution) + 1):

end_time[i][j] = max(end_time[i - 1][j], end_time[i][j - 1]) + times[i][solution[j - 1]]

return end_time

*NEH *starts by sorting the jobs in decreasing order of processing times (sum of all machines). The first iteration adds the job with the highest processing time at the beginning of the schedule. For the remaining jobs, it tries to insert it in all possible positions in the current solution, compares the (partial) makespan of each, and stores the best solution found. This process is repeated until all the jobs are assigned. The function is shown below:

`def neh():`

''' Heuristic NEH (Nawaz, Enscore & Ham) for flow shop scheduling

1 - Start from an empty schedule

2 - Add first the job with highest sum of processing time

3 - Go through the list of the unassigned jobs, test all in all possible positions in the current solutions

4 - Assign the best job at the best position (with lowest makespan) at the final solution

5 - Repeat (3) and (4) until there are no job unassigned

'''

initial_solution = np.argsort(-tot_processing_job)

current_solution = [initial_solution[0]]

for i in range(1, n):

best_cmax = 99999999

for j in range(0, i + 1):

temp_solution = current_solution[:]

temp_solution.insert(j, initial_solution[i])

temp_cmax = get_makespan(temp_solution)[m - 1][len(temp_solution)]

if best_cmax > temp_cmax:

best_seq = temp_solution

best_cmax = temp_cmax

current_solution = best_seq

return current_solution, get_makespan(current_solution)[m - 1][n]

The result found by *NEH *has makespan of *891*, much better than the *1,200 *found by Gurobi as first incumbent.

If we wanted to start with our incumbent, *B&B* from Gurobi would be able to skip a lot of branches with solution higher than *891*. As in the chart below:

Using a warm start in Gurobi is relatively easy. The only challenge is to transform the data structure returned by the heuristic into values of the *MIP *variables. For our problem, we must transform an ordered list and a makespan value into variable values of *y*, *x*, and *c*.

*cmax_heuristic *and *c* variable have thesame meaning, so the only task is assigning the variable start value.

`c.Start = cmax_heuristic`

Transforming the ordered list (*sequence_heuristic*) into variable *y* is also straightforward. One needs to iterate through the list, and if job *j* comes before job *k* in the list, *y_jk = 1*. *0 *otherwise.

`for j in range(n):`

for k in range(n):

if j != k:

y[j, k].Start = 0

for i in range(0, len(sequence_heuristic) - 1):

for j in range(i + 1, len(sequence_heuristic)):

j1 = sequence_heuristic[i]

j2 = sequence_heuristic[j]

y[j1, j2].Start = 1

We don’t need to assign variable *x* values. Gurobi can infer their values from the constraints. Once the model is executed, the user will be able to see the following message:

It means the solution is feasible, its objective is *891*, and it will be used as the *MIP *start. What if we try to insert an unfeasible *MIP* start? For example, if we try to add as a start *y_ij = 0*, for all *i* and all *j*. Of course, this is not feasible since all jobs must be on the schedule. In that case, the user would see the following message:

Having a heuristic that finds a good starting point for *MIP *is not the only use of the *MIP *start functionality. There are a couple of others:

- In a real-world problem, the user could feed the model with a scenario that happened in reality. In this case, the baseline would be the model’s starting point. For example, if a company wants to create a model for their routing process and if there’s historical data available, the routes taken by one day of operation could be a
*MIP*start if it respects the constraints given in the mathematical model. This is also a good exercise to check if the designed constraints are respected in reality. - As already explained, for solvers, our model is just a bunch of equations. If the user doesn’t want to spend time developing heuristics, using a very simple/naive method could also be helpful. For example, an ordered list of jobs by their index is a feasible solution (
*[1,2,3,..,n]*) and could be useful if solver is struggling to find an initial point. - If a very good known solution is available and one wants to prove it is the optimal, or how far it is from the optimal.

Last but not least, it is not guaranteed that a *MIP *start, even if it is better than the solver’s initial solution, will improve convergence and/or run time. This is because the solver takes an entirely different path than the original. The case we showed before is a good example. The convergence curve of *MIP *using a *MIP *start is shown below:

It indeed starts from a better solution, but it gets stuck in values close to *891*.

Finally, *MIP *start is a very useful resource and almost all commercial solvers have this functionality. But its value will depend greatly on the problem, dataset, and time to develop a good heuristic.

All images unless otherwise noted are by the author. Full code is available in GitHub.