I'm a Kaggler as some of you may know. That means I'm enjoying to tackle some of these problems: Kaggle Competions

Well I'm not good at ML and just try to get my hands "dirty" on these problems as I find them interesting. Every December there is a fun optimization challenge which I thought might be interesting.

Santa gift matching was the challenge of last december. During that competition I read about the Hungarian Method and think it might be of interest to you. This post is about explaining the algorithm and implementing it in Julia. I think some of you might enjoy an easier topic once in a while. If you have a different opinion: Don't let the comment section remain empty ;)

Besides the gift matching example there are several applications for the Hungarian Method. Actually I still use the gift matching one as it is easily scaleable and that is the only reason why I reprogrammed the wheel again :/

I used an MIP algorithm which was not the fastest and thought: Let's try something new. Read about the Hungarian method... One google search and I saw this: Hungarian Method with Julia Unfortunately the GitHub repo has also some benchmark results

**Edit:**
The contributor of that project read this article and improved his algorithm. It now has a huge speed-up. Thanks for this @Gnimuc! This article is still reasonable to explain the Hungarian method and show one version to implement this. If you want to use the Hungarian method for your project it would be the best to use that julia package and not the basic implementation of me ;)

Well in the kaggle challenge there are 1 million children who have a wishlist of 100 gifts and there are 1000 different gifts each 1000 times (a total of 1 million gifts).

Now the challenge was to arrange the gifts in a fashion that maximizes the average child happiness. It was a bit more complicated with some more constraints but that was the part I wanted to tackle with the Hungarian method. It was obvious that I can't solve the version with 1 million in one step. My idea was to grab a few thousand (the more the better) optimize that batch. Then repeat ;)

Well I started using a MIP solution, which worked for around 300,000 variables. I filled it with 10,000 children but used only 30 gifts instead of the full 1,000 gifts. One step took about ~150s. In the challenge there were also constraints that some children must get the same gift (twins and triplets). This was easy with adding a constraint to MIP but I wasn't able to build it into the Hungarian Method. The improvement is that the Hungarian Method should consider all 1,000 gifts and hopefully even more children.

You can see the huge step from a 1000x1000 matrix into a 2000x2000 matrix in the benchmark results I mentioned earlier. I was like: This must be improved. That's why I reprogrammed it in Julia. I was somewhat confident that it must be possible and if not I'll learn quite some stuff.

Let's start with: What is the Hungarian Method?

It is an algorithm which finds the minimal sum inside a square matrix given the constraint that each row and each column has exactly one choosen value. If the matrix is `n x n`

this means that we need to find the minimal sum of n elements with that constraint.

Why?

Good question! Well if we have a list of 5 children and a list of 5 gifts and each child has a ranked wishlist, we can optimize the average happiness.

Let's assume you have the following wishlist:

- TV (1), Laptop (2), Smartphone (3), Keyboard (4), T-Shirt (5)

I don't know it's a weird wishlist I guess :D

Anyway... You want to have this TV.

Now there is only one of these items and you "compete" against 5 other people about these.

No one is selfish (of course) therefore you try to find the best overall solution.

As in so many cases: Let's make a matrix...

Okay so in this matrix you're A and your friends B-E. The gifts are 1-5 as described above.

In the A column you can see that the TV is rank 1 for you. The B column shows that it is only rank 2 for B and so on.

A good thing is that you're only competing with E about the TV.

This shows that it isn't possible to just give everyone what he/she wants.

But what is the best configuration? To figure that out we try to find the best choice by checking in each row if there is only a single 1 in it. If this is the case it makes sense to give the corresponding person that gift. Therefore C is happy with the T-Shirt and nobody else really wants it. Unfortunately that is all we have so far.

Now in the general case this matrix is changed in a way that we try to find zeros instead of ones therefore we subtract something in the following fashion:

First subtract the column minimum in each column. This just changes 1-5 to 0-4.

The next step does the same for the row. This is a little bit more interesting. We basically look what is the best choice for the gift. For some gifts it is a bit unfortunate as nobody really wants that gift. I added a column which holds the minima of each row.

Let's subtract that from each row:

What do we have? A challenge. If we find a combination of zeros, more specifically five zeros, in a way that each row has exactly one choosen zero and the same for the cols we have our best matching.

Why? By reducing the rows and cols by their minima we have the guarentee that if we find 0s now that by adding them up (in our base matrix) we have the smallest summed rank and therefore the best matching.

Can we find the five zeros in that manner? Let's try.

I did my best but only found 4 zeros. The problem is that row 4 and 5 have only a single zero and it is in column C.

What can we do now? I'm not sure whether I can explain this part. Actually it was good enough for me that it works at that moment cause I wanted to have a good rank in that Kaggle challenge :D

The steps are:

- Mark the rows which don't have a zero (the gift we have to match somehow)
- Mark all columns (which aren't marked yet) which have a zero in the new marked rows
- Mark all rows having a choosen zero in the new marked columns
- Repeat

Let's do this step by step:

We kind of found the problem why we weren't able to find 5 zeros. Just see what happens next.

The next step is: Draw lines through the marked columns and **unmarked** rows.

Well let's see what we get ;)

This is actually the minimum number of lines you need to cover all zeros. The next step shows us why this is important.

Find the minimum value of all remaining values (the ones that aren't covered by a line) and subtract it from these elements. This wouldn't make sense if there would be a zero left, right? We are doing this because we want to repeat the finding zeros step (hopefully finding 5 this time) and don't increase the child unhappiness by too much. Another step is to add the minimum value we found to the entries which are covered by two lines. This makes these values even less attractive.

We have a new zero (at E4). Is this enough?

It actually is. And I'm happy that you'll get your TV ;)

If this wouldn't be enough we would do the same steps as before.

Let's check how good our solution is and then finally start to code... I'm sorry it always takes a little time to get through that introduction.

The summed rank is 8 whereas the best we could have hoped for was 5.

Okay. Deep into code. Defining our matrix first:

```
function main()
mat = [1 2 3 3 1;
2 3 5 2 2;
3 1 4 1 5;
4 5 2 4 3;
5 4 1 5 4]
end
main()
```

The first step was to find the minima in the colmuns subtract them and then the same for the rows.

```
function run_hungarian(mat_original)
mat = copy(mat_original)
# subtract col minimum
for c=1:size(mat)[2]
mat[:,c] -= minimum(mat[:,c])
end
# subtract row minimum
for r=1:size(mat)[1]
mat[r,:] -= minimum(mat[r,:])
end
@show mat
end
```

and inside the main just call `run_hungarian(mat)`

. We copy the matrix first to not change the original. It depends in the end whether we need it. If we don't then we could save RAM and improve the speed by don't copying it.

The second step is: Finding as many zeros as possible. This sounds complicated but maybe you read my blog about Sudoku and remember the maximum matchings. We can do exactly the same here.

Just make a bipartite graph with edges where we have a zero. This means we have a row of children and a row of gifts and connect one child with a gift if there is a 0 in our matrix. For example there would be an edge from you to the TV. Now finding a maxium matching gives us the choosen zeros. Here I found a good library which seems to be pretty fast: Matrix Networks

Add `using MatrixNetworks`

as the first line and this function:

```
function get_matching(mat)
# create bi partite matching graph
zero_vals = find(x->x==0,mat)
ei = Vector{Int64}()
ej = Vector{Int64}()
for i in zero_vals
# zero based
zi = i-1
zr = zi % size(mat)[1]
push!(ej,zr+1) # row
push!(ei,div((zi-zr),size(mat)[1])+1) # col
end
matching = bipartite_matching(ones(Int64,length(ei)), ei, ej)
return matching
end
```

`zero_vals`

is a vector of (1D) where all the zeros are. In our case it is:

`zero_vals = [1, 2, 8, 14, 15, 17, 18, 21, 22]`

The mat matrix is flatten to 1D in a way that 1-5 is the first column, 6-10 the second and so on.
Therefore we create a column and a row vector out of it so that the MatrixNetworks library can work with it. The library can actually work with weighted edges. We don't need that therefore we use `ones(Int64,length(ei))`

as our weights.

This is our matching vector result:
`matching = MatrixNetworks.Matching_output(5, 5, 4.0, 4, [1, 3, 4, 2, 10])`

The `4`

is the number of edges found (we would like to have a 5 here) and the vector describes which zero was choosen in each column. Let's have a look at our matrix again:

```
5×5 Array{Int64,2}:
0 1 2 2 0
0 1 3 0 0
2 0 3 0 4
2 3 0 2 1
4 3 0 4 3
```

The `[1, 3, 4, 2, 10]`

means: We have marked the first entry in colum 1, third in colum 2 and so on. The 10 is kind of a placeholder => We don't have an edge there.

We are doing these steps over and over again therefore we use a while loop. Then we check whether the cardinality (the number of choosen zeros) is the the same as the number of cols (or rows doesn't matter). If that is the case we are done. So this replaces our test matching function call:

```
nrows = size(mat)[1]
ncols = size(mat)[2]
while true
matching = get_matching(mat)
if matching.cardinality == ncols
return matching.match
else
println("Still stuff to do...")
break
end
end
```

If we found that maximum matching => return the matching and we are done. Otherwise we need to do the marking/drawing part. Let's do the marking part: Mark the rows first.

This small line in front of the while loop:

```
rows_marked = zeros(Bool,ncols)
cols_marked = zeros(Bool,ncols)
```

And this in our `else`

block:

```
else
# Get rid of the values which aren't actually matchings (here removing the 10)
sorted_matching = sort(matching.match)[1:matching.cardinality]
rows_marked[:] = false
cols_marked[:] = false
# mark all rows which aren't matched
marked_col_ind = Vector{Int64}()
marked_row_ind = Vector{Int64}()
new_marked_row_ind = Vector{Int64}()
last = 0
for r=1:matching.cardinality
if sorted_matching[r]-last != 1
for ri=last+1:sorted_matching[r]-1
rows_marked[ri] = true
push!(marked_row_ind,ri)
push!(new_marked_row_ind,ri)
end
last = sorted_matching[r]
else
last = sorted_matching[r]
end
end
for r=last+1:nrows
rows_marked[r] = true
push!(marked_row_ind,r)
push!(new_marked_row_ind,r)
end
end
@show marked_row_ind
break
```

So from `[1, 3, 4, 2, 10]`

we are removing the 10 in the first line and have it sorted.
`[1,2,3,4]`

now we want to get the 5. We do this by checking if there is a gap to the last value and fill it and do the same at the end. So if the vector is `[1,3,5]`

we would have a gap of one between 1-3 and would fill in a 2 the same between 3 and 5. Inside `marked_row_ind`

we have the marked rows. In our case just `[5]`

.

Now it is also quite reasonable to only save the new marked rows in a boolean vector which looks like this: `[false,false,false,false,true]`

in our example.

Now we have to mark the cols and then the rows and this in a repetitive way:

```
changed = true
while changed
changed = false
new_marked_col_ind = Vector{Int64}()
# mark cols
idx_not_marked_cols = find(x->!x,cols_marked)
for c in idx_not_marked_cols
for r in new_marked_row_ind
if mat[r,c] == 0
cols_marked[c] = true
push!(marked_col_ind,c)
push!(new_marked_col_ind,c)
break
end
end
end
new_marked_row_ind = Vector{Int64}()
idx_not_marked_rows = find(x->!x,rows_marked)
# mark new rows
for r in idx_not_marked_rows
for c in new_marked_col_ind
if matching.match[c] == r
rows_marked[r] = true
push!(marked_row_ind,r)
push!(new_marked_row_ind,r)
changed = true
break
end
end
end
end
```

We created a vector which stores the new marked cols and our current step is:

- Mark all unmarked columns which have a zero in the new marked rows

`idx_not_marked_cols = find(x->!x,cols_marked)`

gives us the cols which aren't marked. Therefore we iterate over them and check each entry in that column whether it has a zero. If that is the case then mark it and break for the current column and continue with the next. We basically do the same for the rows. Instead of checking whether an entry is zero we have to check whether the entry is a choosen entry. This is only the case if it's zero therefore we don't have to check that.

This gives us

```
rows_marked = Bool[false, false, false, true, true]
cols_marked = Bool[false, false, true, false, false]
```

Now let's find the minimum of the marked rows and unmarked cols. I thought: Okay let's get a simple stupid version first.

```
min_val = Inf
idx_not_marked_cols = find(x->!x,cols_marked)
for r in marked_row_ind
for c in idx_not_marked_cols
if mat[r,c] < min_val
min_val = mat[r,c]
end
end
end
for r in marked_row_ind
for c in idx_not_marked_cols
mat[r,c] -= min_val
end
end
```

Iterate over it once to find the minimum and then subtract it from every element. The next step is to find the entries were two lines cross the entry so the interesection of rows unmarked and cols marked.

```
# add to intersection of rows unmarked and marked cols
idx_not_marked_rows = find(x->!x,rows_marked)
for r in idx_not_marked_rows
for c in marked_col_ind
mat[r,c] += min_val
end
end
```

That's all. Testing:

Okay let's install the Hungarian package first and make our own simple benchmark results:

I've changed the benchmark file in a way that it only uses integers (UInt16). This speeds everything up and is all I need for my use case.

On my machine it takes about 0.9s for 1000x1000 and 3.8s for 2000x2000.

Let's test our algorithm...

Okay a big disappointment: 19s for 1000x1000 and 58 seconds for 2000x2000. Wow this must be improved.

Hope you're still here ;)

Which part is the one that takes all the time? Well I mentioned in the last section that we keep it simply stupid. Which sometimes is a nice principle but I bet that it's slow.

The finding minimum and then subtracting and adding it took around 18 seconds for 1000x1000 and 54 seconds for 2000x2000.

Let's fix that:

We had this information:

```
rows_marked = Bool[false, false, false, true, true]
cols_marked = Bool[false, false, true, false, false]
```

I don't want to have for loops anymore as they tend to be slow. Therefore let's try out masks.

Trying the most simple one first. One mask for marked rows, one for marked columns, one for unmarked rows and one for unmarked columns.

```
mask_sub = zeros(Bool, nrows, ncols)
mask_sub_2 = ones(Bool, nrows, ncols)
mask_add = ones(Bool, nrows, ncols)
mask_add_2 = zeros(Bool, nrows, ncols)
mask_add[marked_row_ind,:] = false
mask_add_2[:,marked_col_ind] = true
mask_sub[marked_row_ind,:] = true
mask_sub_2[:,marked_col_ind] = false
# Combining mask and mask_2
mask_sub .&= mask_sub_2
min_val = minimum(mat[mask_sub])
mask_add .&= mask_add_2
# subtract/add min_val
mat .-= min_val*mask_sub
mat .+= min_val*mask_add
```

Now that part takes two seconds for 1000 and 5.5 for 2000. I think this can be further improved:

The problem is we are creating new arrays in every loop also the `&`

in calculation is pretty slow.
Let's think about it again. Which entries do we want to add value to?

The ones where the row is not marked and the col is. Let's get the ones where the rows are not marked first. Before the while loop starts we create:

`mask_add = ones(Bool, nrows, ncols)`

Then inside the while loop we have to set it to true every time.

`mask_add[:,:] = true`

Then add marked rows must be set to false:

`mask_add[marked_row_ind,:] = false`

Then before the while loop:

`col_unmask = ones(Bool, ncols)`

and inside:

```
col_unmask[:] = true
col_unmask[marked_col_ind] = false
```

I call it unmask as it is a mask which is true if the col is unmarked.
Now we can use `mask_dbl[:,col_mask] = false`

to set the cols to false which aren't marked.

This is exactly the mask we need. We achieved that we don't need a `&`

anymore and we create it once and we only have a `n x n`

mask and a `n x 1`

mask instead of two `n x n`

masks.

This the mask for finding the minimum add subtract the minimum:

```
mask_sub[:,:] = true
mask_sub[:,marked_col_ind] = false
row_mask[:] = true
row_mask[marked_row_ind] = false
mask_sub[row_mask,:] = false
```

Let's have a look at benchmarking:

It actually takes the same amount of time but it saves some memory allocations.

Our algorithm produces this line for n=2000 9.624005 seconds (1.97 M allocations: 5.343 GiB, 2.34% gc time)

and the library this: 3.805974 seconds (24.35 k allocations: 44.512 MiB, 0.05% gc time)

What is going on with 5GiB of allocations??? If someone has an idea: Please comment!

We can add some `@inbounds`

in front of our remaining for loops which makes it a little faster by not checking any out of bounds errors.

Now I think it is interesting that the library has a jump from 0.9s to 3.8s and this code has a jump from 3.2s to 8.9s so a factor of 3 instead of a factor of 4. Seems small but let's run it on size 4000 and 8000.

```
n = 4000
Lib: 22s
Own: 24s
n = 8000
Lib: 147s
Own: 61s
```

This looks awesome!

**Edit:**
As described above after the update of the Hungarian package the library is now much faster.
It takes them around 5s instead of 147s for n=8000.
Check out that Package ;)

Obviously I still think that there is at least some improvement possible. If you have any ideas -> Fork the repo and make a PR and make a comment here.

A bit more about the Kaggle challenge:

I switched between the MIP and the Hungarian Method to be able to optimize the twins and triplets on the one hand and on the other hand consider all gifts per step and more children.

In the end I was <0.1% worse than the best competitor. Unfortunately this resulted in rank 138 :D

Thanks hope you enjoyed it and if you did: Consider a small donation ;)