In the beginning of October I started as an intern at the Los Alamos National Laboratory (LANL) and just finished. I worked on a mixed integer non-linear problem solver using Branch and Bound which I explained really short in the last article.

This is a more advanced article about Branch and Bound and about non-linear problems and if you're interested in my project you might wanna check out the GitHub Repo Juniper.

Actually it's more about the "What to do with it?" than a "How does it work?" but if you're interested I can write more about that part in another post.

What is Branch and Bound?

Okay before we are working on that question we should ask at first: Why do we need branch and bound?

Let's assume we have a so called mixed integer non-linear problem (MINLP). That's similar to a linear problem as explained in the series about the Simplex algorithm just with a non-linear objective and or non-linear constraints.

Instead of just $ax+by+cz \leq 5$ we can now have stuff like $ax^3+\frac{y}{b}+\sin(z) \leq 5$. That looks more complicated and it is, but not the topic of this article. Some of those variables might be set to be discrete. For example $a$ can only be $1,2,3$ but not $0.13413$.

This is a harder problem even if it looks like it might be easier as there are fewer possibilities.

Branch and Bound is a technique to solve those kind of problems by constructing kind of a tree structure. First we solve the so called relaxation of our problem to get a continuous solution.

This is basically the part I explained last time. This post is to dive a bit deeper into branch and bound. It's not too much about programming it (the code is way too long) but more about different ideas.

Everything is a bit easier if we have a real problem we can solve in my opinion. Sitting in the office for three months and bascially coding the algorithm and visualizing benchmark results. Maybe I should have thought a bit more what kind of stuff I can solve with it instead of just doing it :D

This post tries to get a sense of why is it useful to have non linear constraints and I wanna just use my algorithm for my own formulation of a problem.

Enough! Let's start:

Okay last time we talked about the Travelling Salesman Problem (TSP). That problem is linear but can be extended to non-linear. Wait... Why? Linear is easy! What advantage do we have to make it non-linear?

We assume that we have a drone and several interesting spots we have to control. Let's assume there was an earthquake somewhere and we have one drone and several areas we need to patrol to get a feeling of how much is destroyed. Now we want to fly with our drone over those areas in the most efficient way. We have to visit each area and fly back to our base. This sounds similar to TSP but we have areas instead of points. The reason is that we are in the air and we can see the whole area from different points in the sky.

These are the areas we have to patrol. A short look at our tsp program sounds reasonable.

```
m = Model(solver=GLPKSolverMIP())
@variable(m, x[f=1:N,t=1:N], Bin)
@objective(m, Min, sum(x[f=i,t=j]*euclidean(c_pos[i],c_pos[j]) for i=1:N,j=1:N))
@constraint(m, notself[i=1:N], x[f=i,t=i] == 0)
@constraint(m, oneout[i=1:N], sum(x[f=i,t=1:N]) == 1)
@constraint(m, onein[j=1:N], sum(x[f=1:N,t=j]) == 1)
for f=1:N, t=1:N
@constraint(m, x[f,t]+x[t,f] <= 1)
end
status = solve(m)
```

What do we have: binary variables `x`

where `x[f,t]`

is set to `1`

if we move from `f`

to `t`

and `0`

otherwise. Then we just want to minimize the distance.
We don't allow to move from a point to itself which I think we can actually ignore as we have the `oneout`

and `onein`

constraints.

So `oneout`

means that we have to move away from a point to exactly one other point and `onein`

we have to move in from exactly one predecessor.

Then the for loop says it's not allowed to move from a to b and back again which would result in a line.

```
for f=1:N, t=1:N
@constraint(m, x[f,t]+x[t,f] <= 1)
end
```

Afterwards we added constraints based on the previous solution to disallow small cycles to find the big cycle.

I randomly choose those areas above. Maybe we should get the solution of the tsp (by using the center of each area) first.

Btw the objective is $\approx 1115.19$.

What do we have to change to solve the tspn? The idea is bascially to just touch the circle without really getting to the center which is unnecessary in most cases. It isn't that easy to just touch the circle as it sometimes is the best to get through the center or somewhere in between.

We want to solve a MINLP therefore we need a solver which supports non-linear objectives and or constraints. Juniper can do that ;) Okay this is important: Non-linear problems can be convex or non-convex. Juniper solves non-convex problems only local optimal not global optimal. We tested Juniper on hundreds of problems and most of the time it turns out that the solution is global optimal or close but nothing is proven. I think for this problem we can check whether the solution is actually good using visualization. We should just keep this in our mind.

Let's change the solver parameter:

`m = Model(solver=JuniperSolver(IpoptSolver(print_level=0)))`

and we have to include `Ipopt`

and `Juniper`

.

Therefore our first line looks like this:

`using JuMP, Juniper, Ipopt, Distances`

Ipopt solves the non-linear relaxation and Juniper cares about the discrete part. The `print_level=0`

just takes care about the logs (we don't want the logs of the relaxation).

First of all we need more variables. We can still use the `x`

to use the constraints that we get into each area once and get out of it once. What we need is a specific point inside each area.
We call this `p[i]`

which is actually `px[i]`

and `py[i]`

. So `p[i] = (px[i],py[i])`

now `p[i]`

is the point we want to visit in area `i`

.

```
@variable(m, px[1:N])
@variable(m, py[1:N])
```

Those variables are continuous (real numbers instead of binary or integer). Then we have to change the objective function to:

`@NLobjective(m, Min, sum(x[f=i,t=j]*((px[i]-px[j])^2+(py[i]-py[j])^2) for i=1:N,j=1:N))`

Here NL stands for non-linear. We can't use the euclidean function here because `px`

and `py`

are variables where we don't know the value yet. I use `((px[i]-px[j])^2+(py[i]-py[j])^2)`

instead of
`sqrt((px[i]-px[j])^2+(py[i]-py[j])^2)`

because that one didn't work but it shouldn't matter. We just have to compute the real objective value later.

Now we need constraints to define that `p[i]`

is actually inside the circular area.
Let's define `c[i] = (cx[i],cy[i])`

as the center of the area.

If we want to have a point $(x,y)$ inside the unit cycle we can use a constraint like this:

$$ x^2+y^2 \leq 1$$

More general for a radius $r$:

$$ \sqrt{x^2+y^2} \leq r$$

Now we only have to move the center from `(0,0)`

to our center of the area. In mathematical form with $c[i]=(a,b)$ this would be:

$$ \sqrt{(x-a)^2+(y-b)^2} \leq r$$

This is basically everthing we need.

`@NLconstraint(m, inarea[i=1:N], sqrt((px[i]-cx[i])^2+(py[i]-cy[i])^2) <= cr[i])`

We need `@NLconstraint`

instead of `@constraint`

as it is non-linear and I use `cr[i]`

for the radius of the area.

As you can see most of the time it turned out that only touching the area was a good idea but not always. We aren't finished yet because we have two cycles instead of one. We have to disallow that as we did in the previous article.

The new objective is $\approx 1034.94$. This is at least a bit better than $1115.19$. Okay that was that part. Well do you actually wanna know what's going on in the backend?

Sure that's why you are here right?

Okay first the relaxation is solved. Let's have a look at that part. I explained the simplex method for a while and this is totally different. It's a bit more like getting the minimum of a weird function. Maybe you used Newton's Method in school or university. This can be solved in a similar fashion but is neither part of this post nor part of Juniper.

Maybe we can just have a look at the solution of the relaxation:

```
[0.5 0.0 0.0 0.5 0.0 0.0 0.0 5.99723e-8 0.0 0.0;
0.0 0.5 1.29989e-7 0.0 0.0 0.0 0.0 0.5 0.0 0.0;
0.0 1.29988e-7 0.5 0.0 5.53581e-8 0.0 0.5 0.0 0.0 0.0;
0.5 0.0 0.0 0.5 0.0 0.0 0.0 6.00222e-8 0.0 0.0;
0.0 0.0 6.45498e-8 0.0 0.5 0.0 0.0 0.0 0.0 0.5;
0.0 0.0 0.0 0.0 0.0 0.5 5.53832e-8 0.0 0.5 0.0;
0.0 0.0 0.5 0.0 0.0 6.45763e-8 0.5 0.0 0.0 0.0;
6.00246e-8 0.5 0.0 5.99694e-8 0.0 0.0 0.0 0.5 0.0 0.0;
0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 6.45951e-8;
0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 5.53996e-8 0.5]
```

These are the values for `x`

so the first row gives the idea of where to go from circle no 1.

Circle no 1 is red and no 4 is blue. The relaxation has a value of 0.5 for the connection from 1 to 4.
The other connection is from 1 to 1. This is shit. Do you remember my
"We don't allow to move from a point to itself which I think we can actually ignore as we have the `oneout`

and `onein`

constraints." Yeah that wasn't my brightest moment. Well we can ignore it if we think discrete but it is actually valuable if we think in terms of relaxations.

Just to let you know: It took around 35 seconds to solve this small example. Let's add this constraint and have a look. Now it took around 7 seconds so it was good that we had an extra look.

Let's check the new relaxation:

```
[9.22739e-11 4.9915e-8 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0;
4.99158e-8 2.82652e-12 0.500001 4.99141e-8 0.0 0.0 0.499999 4.99138e-8 0.0 0.0;
0.0 0.499999 1.03438e-10 0.0 0.500001 0.0 4.98851e-8 0.0 0.0 0.0;
0.5 4.99134e-8 0.0 9.2929e-11 0.0 0.0 0.0 0.5 0.0 0.0;
0.0 0.0 0.499999 0.0 1.14548e-10 0.0 4.98943e-8 0.0 0.0 0.500001;
0.0 0.0 0.0 0.0 0.0 4.11253e-13 0.500001 0.0 0.499999 4.83151e-8;
0.0 0.500001 4.98851e-8 0.0 4.98533e-8 0.499999 5.07361e-11 0.0 4.99468e-8 0.0;
0.5 4.99153e-8 0.0 0.5 0.0 0.0 0.0 8.85845e-11 0.0 0.0;
0.0 0.0 0.0 0.0 0.0 0.500001 4.99441e-8 0.0 4.67071e-11 0.499999;
0.0 0.0 0.0 0.0 0.499999 5.1676e-8 0.0 0.0 0.500001 3.60475e-12]
```

Okay let's actually round that to one decimal.

```
0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0
0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0
0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0
0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.5
0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0
0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5
0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0
```

It looks like it is always a good choice to split the way into two choices. I hope this doesn't get too boring :D We will have a look at different/bigger problems later...

Maybe there is a reason why there are always two choices, right?
I mentioned it shortly in the last blog post. We are using a symmetric distance function.
So actually `x[1,2] = 1`

and `x[2,1] = 0`

is basically the same as `x[1,2] = 0`

and `x[2,1] = 1`

.
It just has to be consistent. We can use half as many variables by using only the upper diagonal.

```
@variable(m, x[f=1:N,t=f+1:N], Bin)
@variable(m, px[1:N])
@variable(m, py[1:N])
@NLobjective(m, Min, sum(x[f=i,t=j]*((px[i]-px[j])^2+(py[i]-py[j])^2) for i=1:N,j=i+1:N))
for i=1:N
@constraint(m, sum(x[f=j,t=i] for j=1:i-1)+sum(x[f=i,t=j] for j=i+1:N) == 2)
end
@NLconstraint(m, inarea[i=1:N], sqrt((px[i]-cx[i])^2+(py[i]-cy[i])^2) <= cr[i])
```

The one in one out constraint is:

$$ \sum_{j=1}^{i-1} x_{ji} + \sum_{j=i+1}^{N} x_{ij} = 2 \quad \forall i \in N$$

Now we need to have a second look at our datastructure. This is the result for `x`

after converting it into a dense array.

```
0 0 0 1 0 0 0 1 0 0
0 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0
```

Only entries in the upper diagonal can be set to 1. Btw yes this is the relaxtion. And as you can see it is discrete and as it turns out actually without subtours. That is a huge improvement.

But yeah let's have a look as it is a bit more complicated than the version before.

My idea was to generate the result vector where we see where to move from one point to another first and use the same structure for finding and eliminating subtours.

We can start in the first row:

`0 0 0 1 0 0 0 1 0 0 `

pick the first index where it's one and save this in our result vector.
This would be `[4, -, -, -, -, -, -, -, -, -]`

atm. Now we remove the 1 from the position `[1,4]`

as we used this information already. Then we look at the fourth row because this is our next area.

`0 0 0 0 0 0 0 0 0 0 `

there is no 1 in this row. The other option is to look at the fourth column now.

`1, 1, 0, 0, 0, 0, 0, 0, 0, 0`

It was like this but we removed the first 1 therefore it is now:

`0, 1, 0, 0, 0, 0, 0, 0, 0, 0`

The first 1 is at position 2 so our vector is the following:

`[4, -, -, 2, -, -, -, -, -, -]`

In the end we get

`4, 3, 7, 2, 10, 8, 5, 1, 6, 9`

Because we could jump from one to the next and we removed the `1`

each time and found another 1 we can be sure that it is the big cycle. If we get stuck somewhere without filling the whole array we know that we found a subtour.

I have build another TSPN with 15 areas and there I get this result vector:

`[5, 14, -, 6, 15, 9, 11, 2, 13, -, 1, -, 8, 7, 4]`

That means we found a subtour which is the following.

`1 -> 5 -> 15 -> 4 -> 6 -> 9 -> 13 -> 8 -> 2 -> 14 -> 7 -> 11 -> 1`

Now there is no `x[15,4]`

so we use the `x[4,15]`

to eliminate the subtour.
That looks like this: where `cycle_idx`

is the vector shown above.

```
last = 1
for i=2:length(cycle_idx)
if last < cycle_idx[i]
sumx += x[last,cycle_idx[i]]
else
sumx += x[cycle_idx[i],last]
end
last = cycle_idx[i]
end
if length(cycle_idx)-1 < N
@constraint(m, sumx <= length(cycle_idx)-2)
end
```

Let's have a look at the bigger TSPN:

Now we know that the relaxation isn't perfect so let's have a look at the relaxation values:

```
0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.5
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.0 0.0 0.5 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
```

Again a lot of 0.5 and some 1.0 which looks like a pretty good relaxation. To make it a bit easier I'll only show the upper diagonal:

```
- 0 0 0 / 0 0 0 0 0 1 0 0 0 /
- - 0 0 0 0 0 1 0 0 0 0 0 1 0
- - - 0 0 0 0 / 0 / 0 1 0 0 0
- - - - 1 0 0 0 0 0 0 0 0 0 1
- - - - - 0 0 0 0 0 0 0 0 0 /
- - - - - - 0 0 1 / 0 0 / 0 0
- - - - - - - 0 0 0 1 0 0 1 0
- - - - - - - - 0 0 0 0 / 0 0
- - - - - - - - - 0 0 0 1 0 0
- - - - - - - - - - 0 1 0 0 0
- - - - - - - - - - - 0 0 0 0
- - - - - - - - - - - - 0 0 0
- - - - - - - - - - - - - 0 0
- - - - - - - - - - - - - - 0
- - - - - - - - - - - - - - -
```

Okay and I changed `0.5`

to `/`

because that is the only fraction we had ;)

Now finally to Branch and Bound... Sry for the long introduction. The basic idea of branch and bound is to divide this problem into several problems where we fix some of the values and solve the relaxation again and so on.

This means we are actually building kind of a tree structure.

This is the first step of the tree. Okay let me explain. We have the relaxtion at the root node. And the objective value for the relaxation is $1.05*10^5$. Why is it so huge? Because we don't have the sqrt part in it. It doesn't really matter. We still want to minimze that value. The 14 before that objective value is the choice of Juniper as the branching variable.

This is basically the choice of `x[1,15]`

which was 0.5. Now the left part branch means that we set
`x[1,15]`

to 0 and the right part represents the choice of setting it to 1.

The red circle on the right means that the relaxation believes that setting it to `1`

is infeasible. This seems to be unlikely but well it is a local solver so stuff like this happens.

The left part indicates that the objective of the relaxtion for `x[1,15] = 0`

is still $1.05*10^5$.
There Juniper wants to branch on variable no 60. Which is: `x[5,15]`

.

I use a blue outer circle to indicate a discrete solution. The objective value here is $1.07*10^5$ which is less optimal than the relaxtion if we set `x[5,15] = 1`

. This means that we found a solution but it might be not even local optimal. You might think: Wait why is that already integral... There are still values which are set to 0.5 right? Well it turned out that fixing those two values results in a discrete relaxation.
We don't have to fix every value to get this kind of luck.

Just for reference: This is the complete data structure at the end (after adding all subtours).

We can see that there are several tours we can make and the best found has an objective of $1.06*10^5$ which represents the actual objective of $\approx 1073.09$.

It took around 6 seconds to compute this and 5 cuts needed to be generated.

Now how does Juniper choose which variable to choose next?

There are different options: The easiest one is choose the variable where the value is closest to 0.5.
That technique is called `MostInfeasible`

. For this special problem it turned out that it is actually faster.
It took 3 seconds. Well normally it's not the best choice ;)

Another option is called `PseudoCost`

this uses `MostInfeasible`

for the first step and then computes a so called gain. The gain determines the improvement of the relaxation by choosing this variable.
Let's assume we want to minimize some function and our relaxation says: It can't be smaller than 10.

Now we branch on a variable and the result is that on one branch the relaxation is now 12 and on the other one 11.
This means that we improved our relaxation by a bit. We can say it's just the average of the difference to the parent relaxation. In this case it would be: `1.5`

. Well this is one way to compute it. The one implemented is a bit more complex but that should be good enough for now.
Let's assume we branch on a different variable and the relaxation on both branches is 20 now. Then we know it can't be better than 20. This is way better than before and we would have a gain value of `10`

.
It was a better choice to branch on this value as we gained more information. PseudoCost branching keeps track of those values and branches on the variable where it thinks that the best gain can be achieved.

This is just a guess and for a small tree like ours it is probably not reasonable. I just tried it and it took 25 seconds and 10 cuts were needed.

Okay so what is the default setting of Juniper?

It's called strong pseudo cost branching. Strong branching computes the gains for several variables and then finally choose the best one. Juniper uses strong branching on all variables (full strong branching) if fast and otherwise on some of them at the first level. Afterwards it uses the information and uses Pseudo Cost branching.

Let's try something big at the end:

Wow that took a while (more than 6 minutes). It is also much faster (the other one didn't wanna terminate :D) if you add all found cycles in one go instead of just the one where the first area is part of it.

I also used the `MostInfeasible`

branching strategy here and it is still running after 30 minutes. At least in this case `StrongPseudoCost`

turned out to be reasonable ;)

I'm sure there are ways to speed things up but for now I'm quite satisfied.

Hope you liked the post and if you have some interesting MINLP: Tell me more about it ;)

Here is a version for ellipses:

You can download the Julia code and my JS code which I used for the visualization on GitHub.

If you want to keep up to date you should watch the Juniper Repo!

Further reading:

Affiliate Amazon.com