Sudoku puzzles, Constraint programming and Graph Theory

Creation date: 2017-02-08

Tags: puzzle, sudoku, python, constraint-programming, optimization, graph-theory

First of all, sorry that I didn't write for a long time. I had to finish my bachelor's thesis. Done! Then I flew to Australia to travel around. I'm currently in Perth in the library because my laptop broke down...

Well whatever. Let's start with something interesting. I wrote about machine learning and memory palaces and the last post was about a 3D game. Today I want to go back in a lower dimension. Welcome 2D!

I started an online course about discrete optimization at coursera.

There you listen to some lectures between 13.02 and 16.04. and have some programming challenges. At the end you'll get a certificate for around 65$. Some knowledge in mathematics, computer science and Python is helpful.

You can start the course now (which I did). I finished the first two weeks and got stuck at the moment the lecturer mentioned Sudoku. It is a dream for a long time to solve this relatively simple game in Python. Actually I started several times using backtracking but I never finished. I don't know why...

News: (05.09.2019) I've started to write a constraint programming solver in Julia and the first part is about backtracking. Check it out!

This week I started to solve it in a different fashion. I had some optimization courses and seminars in university where we used Gurobi to solve some bigger projects. I was curious how this solvers work so I started the coursera course and started this little project.

For everyone who doesn't know what Sudoku is. It's a really popular puzzle. There is a 9x9 grid which consists of nine blocks which have a size of 3x3. sudoku

Rules: Fill the grid so that:

The backtracking approach would be to fill the first field (top left corner) with an 1 and check if the rules are unsatisfiable afterwards. In this case it would be unsatisfiable because the first row has already the digit 1. So we could check the next digits until we find one which looks promising. In this case the 3. Afterwards we go to the next field and so on. If there are no other options we would backtrack until we find the complete solution.

That is kind of a dump system. So how do humans solve a Sudoku? We can look for a row, column or block which has already 8 digits then it's easy to add the ninth. Or a bit more complicated: We combine the rules. Then we are able to add this 6.


We can fill it because the digits 1,2,5,7,8,9 aren't possible (row rule) and the digits 1,3 and 4 aren't possible (block rule). 6 is therefore the only possible value.

Now the question is if it is possible to write a program which does exactly that stuff? First we define our grid:

grid = [[0]*9 for i in range(9)]
grid[0] = [0,2,1,0,7,9,0,8,5]
grid[1] = [0,4,5,3,1,0,0,0,9]
grid[2] = [0,7,0,0,4,0,0,1,0]
grid[3] = [0,0,0,1,0,8,0,3,6]
grid[4] = [0,6,0,0,0,0,2,0,8]
grid[5] = [0,0,0,0,0,3,0,0,4]
grid[6] = [6,0,8,0,0,0,0,0,0]
grid[7] = [0,9,4,0,0,7,8,0,0]
grid[8] = [2,0,0,5,0,0,0,4,0]

grid = np.array(grid)

This is our representation of the Sudoku grid where 0 represents the unknown value. Then it is common to have something like a search space. There we define every value which looks promising for the solution. In the beginning every value looks promising (we have no idea yet :D). Therefore we say that the values 1-9 are possible for every place which isn't set yet (all 0 values).

model = Model()

We will define our Model class later. The first parameter is our grid the second our range of values which can be part of the solution and 0 is the representation of every unassigned value in our grid.

Then we describe our rules as constraints:

# per row
for r in range(len(grid)):
    idx = np.full(grid.shape, False, dtype=bool)
    idx[r,:] = True

That basically means for every row in our grid:

The rule was that we have to assign the digits 1-9 in each row. We only have nine possible values (1-9) and nine positions. Therefore we can say that each of the values in that row should be different from each other. And we want to check the constraint every time a value in that row changed or just if we were able to reduce the search space for that row.

Okay stop for a moment. We visualize it on a small part. We have a small game which has three positions and the three digits 1-3

Simple Sudoku

First the search space is:

Search space

We will solve the puzzle by saying that every value changed. Then the check_constraint function will be called to check the alldifferent constraint.

This will reduce the search space to:

Search space

If some other constraints sets the second value to 2 the check_constraint function is called again and we can set the third value to 3.

That is basic concept. Let's add the other rules in the same way:

# per col
for c in range(len(grid[0])):
    idx = np.full(grid.shape, False, dtype=bool)
    idx[:,c] = True

# per block
for r in range(3):
    for c in range(3):
        bxl,bxr,byt,byb = r*3,(r+1)*3,c*3,(c+1)*3
        idx = np.full(grid.shape, False, dtype=bool)
        idx[bxl:bxr,byt:byb] = True

At the end we want to solve the model and print the solution:

solution = model.get_solution()

The overall structure of the project should look like


The code mentioned above is part of

For our model we define the

import networkx as nx
import numpy as np
from .Error import InfeasibleError

class Model:
    def __init__(self):
        self.subscribe_list_on = []
        self.subscribe_list_func = []
        self.nof_calls = 0

In subscribe_list_on we store a list of indexes which tell us when the subscribe function should be called. subscribe_list_func stores the function calls and nof_calls is the number of calls. Using this we can see if we call something too often and how hard it was to solve the model.

def subscribe(self,on,func,*args):

I decided to use the parameter on as a dictionary to have some more freedom for other projects. Here we simply add the indexes to the subscribe_list_on list and the function with the arguments to subscribe_list_func.

For solving the model we say that every value has changed which should basically call all functions.

def solve(self):
    try:, True, dtype=bool))
    except InfeasibleError as e:

If the model is infeasible which simply means that there is no solution we want to exit and out print the error.

The fire function is relatively simple as well.

def fire(self,idx):
    i = 0
    self.changed = np.full(self.changed.shape, False, dtype=bool)
    for lidx in self.subscribe_list_on:
        if np.any(np.logical_and(lidx,idx)):
            func = self.subscribe_list_func[i][0]
            args = self.subscribe_list_func[i][1]
            except InfeasibleError as e:
                raise e
            self.nof_calls += 1
            i += 1

    if np.any(self.changed):
            except InfeasibleError as e:
            raise e

Here we say that nothing had changed and the function check_constraint will update the knowledge and change the self.changed matrix again until the solution is found. Then for every subscribe call we check if the indexes (idx) with which the fire function was called affect the constraint. idx is a bool matrix in our example a 9x9 matrix and all the lidx matrices are the same. If the logical and of these two matrices has at least one true value we want to call the check_constraint function. That is done in

if np.any(np.logical_and(lidx,idx)):
    func = self.subscribe_list_func[i][0]
    args = self.subscribe_list_func[i][1]

And if something changed at the end we just call the fire function with an updated changed matrix. If nothing changed we solved the problem.

At first we want to build the simple search space.

def build_search_space(self,grid,values,no_val=0):
    self.search_space = np.empty(grid.shape,dtype=dict)
    self.changed = np.full(grid.shape, False, dtype=bool)
    no_val_idx = np.where(grid == no_val)
    no_val_idx_invert = np.where(grid != no_val)
    self.search_space[no_val_idx] = {'values':values[:]}
    for idx in np.transpose(no_val_idx_invert):
        t_idx = tuple(idx)
        self.search_space[t_idx] = {'value':grid[t_idx]}

Therefore, the search space needs to have the same shape as our grid. And we initialize the changed grid here as well. Then we assign the values (in our case 1-9) to every entry in the search space which has the no_val. And we set the value of the fixed entries in the search space as well.

Now the difficult part:

def check_constraint(self,opts,operator):
    if operator == "alldifferent":

How do we update the search space if something has changed? The first step will be easy. We can delete all digits from the search space which already exist. But is that enough?

Let's try it out:

def check_constraint(self,opts,operator):
    if operator == "alldifferent":
        ss_idx = opts['idx']
        values = self.search_space[ss_idx]

First we want to get all search space indexes (ss_idx) which are affected and get the values from the search space.

Then we save all digits that are already fixed:

already_know = []
new_possible = [False]*len(values)

and we want to replace the related entries in our search space by the new possible values. Afterwards we simply add all known values to the already_know array.

i = 0
for v in values:
    if 'value' in v:
        new_possible[i] = {'value': v['value']}
    i += 1

And of course these fixed values will not change so we add them to the new_possible array.

We build a subscribe system so we have to see which values have been changed. Therefore we use an array:

new_knowledge = [False]*len(values)

First we didn't change anything yet so it is initialized with the value False.

Now let's reduce the search space:

i = 0
for v in values:
    if 'value' not in v:
        new = [x for x in v['values'] if x not in already_know]

Here we use every value in our current values array and remove all the values which can't be in this particular row, column or block.

if len(new) < len(v['values']):
    if len(new) == 1:
        new_possible[i] = {'value': new[0]}
        new_possible[i] = {'values': new}
    new_knowledge[i] = True
    new_possible[i] = {'values': v['values']}

Then we check if that changed anything by comparing the length of the two arrays. If something has changed we update the new_possible array and if only one value is possible we set that value. In the end we say: Yes we changed a value new_knowledge[i] = True and if we didn't change we just set the old values into our new_possible array.

Okay now we have everything we need. Let's update the search space and the changed array.

old_changed = self.changed.copy()
self.changed[ss_idx] = new_knowledge
self.changed = np.logical_or(self.changed,old_changed)

self.search_space[ss_idx] = new_possible

For changing the changed matrix we have to check if we changed something now or if it already was changed before. Therefore we use a logical or. The search space can be simply updated using our indexes and the new_possible array.

Let's solve the model...

Oh wait we need some more code. We had the following part at the end of our

solution = model.get_solution()

Let's define the small functions get_solution and print_sudoku.

def get_solution(self):
    grid = [[0]*9 for i in range(9)]
    for r in range(len(self.search_space)):
        for c in range(len(self.search_space[r])):
            if 'value' in self.search_space[r][c]:
                grid[r][c] = self.search_space[r][c]['value']
    return grid

There a grid is created and all the fixed value parts of our search space are filled in. For printing the actual solution we use this:

def print_sudoku(grid):
    for r in range(len(grid)):
        row = ""
        for c in range(len(grid[r])):
            if c%3 == 0:
                row += "["
            row += " "+str(grid[r][c])
            if c%3 == 2:
                row += " ]"
        if r % 3 == 2:

That out prints:

[ 3 2 1 ][ 6 7 9 ][ 4 8 5 ]
[ 8 4 5 ][ 3 1 2 ][ 6 7 9 ]
[ 9 7 6 ][ 8 4 5 ][ 3 1 2 ]
[ 4 5 9 ][ 1 2 8 ][ 7 3 6 ]
[ 1 6 3 ][ 7 5 4 ][ 2 9 8 ]
[ 7 8 2 ][ 9 6 3 ][ 1 5 4 ]
[ 6 3 8 ][ 4 9 1 ][ 5 2 7 ]
[ 5 9 4 ][ 2 3 7 ][ 8 6 1 ]
[ 2 1 7 ][ 5 8 6 ][ 9 4 3 ]

There is no zero anymore so we did it! We solved it yeah... ... and it was pretty fast. This was generated in 0.03s.

Now look on the scrollbar on the right side of your screen. That shows you that we are not there yet.

But why???

We need a look at a new much harder Sudoku to show that this isn't enough.

Extrem Start

Let's try to solve it:

[ 0 0 0 ][ 5 4 6 ][ 0 0 9 ]
[ 0 2 0 ][ 3 8 1 ][ 0 0 7 ]
[ 0 0 3 ][ 9 0 0 ][ 0 0 4 ]
[ 9 0 5 ][ 0 0 0 ][ 0 7 0 ]
[ 7 0 0 ][ 0 0 0 ][ 0 2 0 ]
[ 0 0 0 ][ 0 9 3 ][ 0 0 0 ]
[ 0 5 6 ][ 0 0 8 ][ 0 0 0 ]
[ 0 1 0 ][ 0 3 9 ][ 0 0 0 ]
[ 0 0 0 ][ 0 0 0 ][ 8 0 6 ]

You might look at it and think: Wait I said SOLVE!!!

At least we got three new values:

Basic solved

Unfortunately that's all we can get with our simple model. Before we start further we have a look at our search space:

Search space after basic

Okay what can we see? There are some entries in the search space which are constraint quite a lot and some which are less constraint. We found already three values in one block. Let's have a look on the two values which aren't assigned yet. They both can have the digits 2 and 7. Because they are in the same row this actually gives us new information because both the 2 and the 7 needs to be placed in the block and only once in the row. Therefore there can't be 7 at the second position of the third row and no 2 in the third block in the bottom left corner.

The next image shows a different representation of a search space.

Search space one row

We will use this representation where we have our nine entries which are visualized in the first row a-i and the values 1-9 at the bottom. Let's draw some arrows.

Fixed search space

These arrows show the fixed values. Including the search space it looks like:

Full search space

That actually looks quite messy but okay. We built a graph. Let's use some graph theory. I learned it university and never used it... Until now!

In more general constraint programming challenges we first would like to know if the model is feasible. So we want to know if there is a configuration where we can fulfill the alldifferent constraint. To solve this problem we use our knowledge about maximum matching. For all of you who don't know what a maximum matching is: I first explain what a matching is.

A matching is a list of edges which don't have common vertices. That basically means a node is not allowed to have more than one edge in a matching.

To get a maximum matching we just want the highest amount of edges we can find to fulfill the matching criteria.

If you're interested of how to find maximum matching you can have a short look on Wikipedia.

I want to show you one:

Maximum matching

Here it is feasible because we are able to find a matching which has size 9 (9 edges). That means for every position we can assign a value so that all positions have a different value.

The general idea is now to more or less find all of those maximum matchings and to check if there are edges which don't appear in any of them. It isn't that easy/fast to find all maximum matching but there is a nice lemma: Berge's lemma

It says: An edge belongs to some but not all maximum matchings if and only if, given a maximum matching M, it belongs to either

Okay wait... What do we wanna do again? We want to know if an edge is part of any maximum matching. Now we have an lemma which says there is something which holds for every edge which is part of at least one maximum matching. Well we said that we don't have a free vertex here because we need to assign all the values 1-9 to our positions. Therefore an edge belongs to a maximum matching if it belongs to an even alternating cycle. Alternating in this case means that we start with an edge which is in the matching and have to use an edge from there which isn't part of the matching. Then we have to use an edge from the matching and so on... Here we can use strongly connected components. A strongly connected component is part of a graph which is itself a strongly connected graph. A strongly connected graph is a graph in which every vertex is reachable by every other vertex. Which basically means that if we have a directed graph there needs to be a cycle.

But we have undirected graph, right? And wait a second then we have to check the alternating stuff.

We can transform the undirected graph in a fashion that it is a directed graph where we don't have to check the alternating constraint.

Directed graph

Here every edge which is in our maximum matching is directed downwards and the other edges are directed upwards. Now we just have to find strongly connected components which have an even cycle. Actually we can forget the stuff with the "even" part because we have neither connections between the upper parts nor connections between the lower parts. Therefore to create a cycle we always need an even number of edges.

Let's find an even alternating cycle:

one alternating cycle

The blue arrows form a cycle and another one:

second cycle

And the third one... Third cycle

Well there is another one: Cycle #4

and the last...

last cycle

Now let's remove all edges we used in the first maximum matching as well as in one of those cycles and have a look what is left:

Two edges we can remove

These are exactly the two edges we wanted to remove.

Fortunately there are graph libraries for Python which can do all this stuff for us. I used networkx.

ss_idx = opts['idx']
values = self.search_space[ss_idx]

G = nx.MultiDiGraph()
already_know = {}
for i in range(len(values)):
    if 'values' in values[i]:
        for j in values[i]['values']:
        already_know[i] = 1

We do the same initialization as before and build the search space graph and we save the values which we already know in already_know.

Full search space

This line gives us a maximum matching:

matching = nx.bipartite.maximum_matching(G)

Then we build the second graph where we have the directed version.

n_matching = []
GM = nx.DiGraph()
possible = np.empty((len(values)),dtype=dict)
for k in matching:
    if str(k)[:2] == 'x_':
        possible[int(k[2:])] = {'values':set([matching[k]])}

In that part we only add the edges which are already in the matching and define an empty numpy array which holds the new possible values (possible). We add the value of the current matching to the array. Then we check if we can really reach all nine values:

if len(n_matching) < len(values):
    raise InfeasibleError("Infeasible","The model is infeasible")

Yeah I can work on the error message... Then we add all the other edegs:

for e in G.edges():
    if not GM.has_edge(e[0],e[1]):

That means if we don't have the edge in the one direction we add it in the other direction which gives us this:

Directed graph

Then we find all strongly connected components:

scc = nx.strongly_connected_component_subgraphs(GM)
for scci in scc:
    for e in scci.edges():
        if str(e[0])[:2] != 'x_':
            e = (int(e[1][2:]),e[0])
            e = (int(e[0][2:]),e[1])
        if 'values' not in possible[e[0]]:
            possible[e[0]] = {'values': set()}

and if the edge is part of a strongly connected component we add the connection in our possible array.

We are coming to the end...

new_possible = []
new_knowledge = [False]*len(values)
i = 0
for p in possible:
    l = list(p['values'])
    if len(l) == 1:
        if i not in already_know:
            new_knowledge[i] = True
        if len(l) < len(values[i]['values']):
            new_knowledge[i] = True

    i += 1

We just want to know which part we changed therefore we use new_knowledge = [False]*len(values) and if we construct the new_possible array which uses the possible array but converts it into the search_space dictionary form.

At the end we use the same lines as before for our subscribe system:

old_changed = self.changed.copy()
self.changed[ss_idx] = new_knowledge
self.changed = np.logical_or(self.changed,old_changed)

self.search_space[ss_idx] = new_possible

That's it we can solve our model:

[ 1 7 8 ][ 5 4 6 ][ 2 3 9 ]
[ 4 2 9 ][ 3 8 1 ][ 5 6 7 ]
[ 5 6 3 ][ 9 2 7 ][ 1 8 4 ]
[ 9 3 5 ][ 2 1 4 ][ 6 7 8 ]
[ 7 4 1 ][ 8 6 5 ][ 9 2 3 ]
[ 6 8 2 ][ 7 9 3 ][ 4 1 5 ]
[ 2 5 6 ][ 4 7 8 ][ 3 9 1 ]
[ 8 1 4 ][ 6 3 9 ][ 7 5 2 ]
[ 3 9 7 ][ 1 5 2 ][ 8 4 6 ]

and that was possible in 0.3s which isn't too bad.

Well now there might be Sudoku where two possible solutions exist and we have to use some backtracking to solve it but I think we are done for the moment.

Thanks for reading! I hope you enjoyed the post. I would like to enhance my model and solve some other challenges. If you have a game or another challenge... Make a comment! I'll try to solve it and add some other constraint stuff to my model and maybe backtracking.

You can download all the code on my OpenSourcES GitHub repo.

If you're interested in solving your own problem using constraint programming and don't wanna wait until my solver can do it :D Python-Constraint is an existing library which can be used. It solved the harder Sudoku in 0.0007s.

As mentioned at the beginning now in 2019 I'm actually trying to build a whole constraint programming solver. You can see the start of the series here: Constraint solver in Julia.

You enjoyed this post and would like to support me? Spend some money ;)

If you enjoy the blog in general please consider a donation via Patreon. You can read my posts earlier than everyone else and keep this blog running.

Want to be updated? Consider subscribing and receiving a mail whenever a new post comes out.

Powered by Buttondown.

Subscribe to RSS