AlphaZero for Nash

Creation date: 2018-03-25

Tags: ai, python, neural-nets, machine-learning, tensorflow

It's a long dream of mine to build something like a chess engine because I love to play chess. Well there are chess engines all over the place but it's something different when you build it by your own, right? Okay chess is a quite complicated game. First it isn't totally trivial to teach the computer the rules of the game and using AI to teach it was recently done by Deepmind using AlphaZero which was implemented after the company teached a computer to play Go which is an even more complex game than chess. They have a blog article about AlphaGo Zero on their website

They "taught" the computer later... well actually the computer was able to teach itself how to play Chess and Shogi. Deepmind (Google) was able to run the algorithm for only a couple of hours to beat the until then strongest chess engine Stockfish which is completly open.

Well it's Google so mentioning the time of "couple of hours" doesn't say much as this isn't comparable to the hardware I have or probably you have.

Which games are easier and still interesting? Well on the easy side is Tic-tac-toe for example. Which I and probably most of you as well consider as boring. The boring thing about it is that if you play perfectly (which isn't hard) you end in a draw. There are draws in chess as well, but well it's not easy to reach that state.

If you're interested in John Nash the man behind the story of the famous movie: A beautiful mind. I recommend this book as it is a more accurate story about him.

Anyway, he reinvented a game called Nash or John at his university. I say reinvented as he wasn't the first who came up with the idea but he wasn't aware that the game already existed.

I came across the game called [https://en.wikipedia.org/wiki/Hex(boardgame)] otherwise by reading about John Nash so I prefer the name Nash. The moment I read about the game in the book I knew that I have to try to build an AI playing it.

There are several points I like about this game:

It's a strategic game as Chess, Shogi or well maybe Tic-tac-toe
It's resizeable (a chessboard is 8x8) but here you can choose the size as you wish
There are no draws

Nash

A short explanation of the game on a small grid (6x6) which is far smaller than the size you normally play. Nash considered boards of the size 14x14 as perfect.

Well I have to say it somewhere... I changed the design a bit :D It's normally tilted to the left instead of the right but well it was kind of a bug in my code and then I thought well no problem this doesn't really matter. Probably I should have fixed it but now it's quite late :D

Okay the rules: There are two players: Red and blue. Red starts the game by placing a stone into one of the here 36 hexagonal cells. Then Blue has to make a move and so on. You always have to make a move (pass is not allowed) and you can only occupy a cell which isn't yet occupied.

It is quite similar to tic-tac-toe in that manner. One difference is that you have hexagonal cells and the other more interesting difference is the goal of each player.

In most games (well actually all I think...) I can think of the goal of each player is the same. Here the red player tries to form a connected path from the top to the bottom and the goal for the blue player is to build a connected path from left to right instead.

It is proven by Nash that the game has no draws. There is also a winning strategy for the first player (red) but it isn't easy to find and there is no strategy found for boards which are bigger than 9x9.

This game is perfect for my endeavor because it is resizeable. If it takes too long to learn the game for the AI I can simply make it smaller. I agree that it is pretty uninteresting for small sizes, but anyway it is kind of dream of me... Okay? :P

It seemed pretty straightforward to use a AlphaZero approach as it isn't programmed to be specific in any way. I wrote some articles on this blog about tensorflow and recognizing hand written digits but after that I didn't do that much with AI. My goal normally is to implement things more or less from scratch to fully understand them but well everybody says: Don't reinvent the wheel and I know they are right :D

I came across this GitHub repository alpha-zero-general and it is a framework to build your own game engine. That seemed pretty neat. They have trained engines for tic-tac-toe, Othello and Gobang.

Still my goal was to at least grab some knowledge about how it's done. I've read some articles like the one about this framework on their blog.

It actually a bit too complicated for me to fully understand it and I thought well what do I have to know to use it and what do you need to know to use it and enjoy reading this blog article.

What was the approach of game engines before things like AlphaZero? Well what you can do is try to evaluate a current position and then build a tree like structure for the moves you're able to make evaluate the next position and check whether it's favorable to make that move. Of course it is reasonable to make more moves like move for the current player then move for the opponent and so on until you get to a point where you know that it's a good decision.

The problem of course is that you can't brute-force every combination of moves and it isn't that simple to know whether position is favorable for Red or Blue in our case or for games like Chess whether white is better or black. You can sum up the material as an indicator but this isn't enough.

Before AlphaZero the main approach was also to use human experts and thousands of human expert games to teach it the machine how to deal with it. Well the main idea of AlphaZero as the name suggests is that we need ZERO human training. All we need is the rules of the game.

The neural network gets the position of the board as the input and has the probability of the moves, which moves are likely to result in a better position for the current player, as well as a prediction of how good the current position is for the current player. This evaluation of the position has a continuos range of -1 definitely loosing to 1 definitely winning.

The rough idea is to play some random games at the beginning and train the network and then be better next time. Behind the neural network which is the main component is a Monte Carlo tree search. Okay what is that? Again I only use a very rough explanation here and you can check out more if you want or please make a comment if you want that I write something about it.

Monte Carlo tree search is basically a tree of positions in our game which holds a record of wins and losses beginning from that state. Which moves to choose? There is the problem that we aren't able to try everything so which moves are reasonable? There needs to be a balance between exploration and exploitation.

Should we choose a move where we are almost certain that we win and dive deeper into that variation (exploitation) or should we try something new or some move which we at least didn't try that often but doesn't seem to be extremely bad (exploration).

This should be enough about the idea behind it for this article. Hope that's fine otherwise don't hesitate to comment.

A fellow student and me started to build a website where we can play first as we didn't have much time and then started to build on the main AI approach.

The website and everything isn't finished yet but I wanted to write about it anyway and maybe can make some more articles about it and maybe you have some suggestions as well.

Let's implement:

All you need to do is implement some basic functions and provide the neural network, well we don't know much about AI so we used a neural network from a different game.

The functions you need to implement are providing the legal moves a player can make and a function which returns whether the game has ended and who won the game if it has ended.

Easy right?

Well, kind of... As mentioned earlier Nash has a key difference compared to other games. The players don't have the same strategy. As this is unusual the framework works with an approach which makes it easier to learn and therefore faster. In tictactoe you can simply use the same strategy as you have the same goal. The framework therefore uses a function which should return a so called canonical form of the board. This is a form where the player is now player 1 and has to make a move. In the framework we have player 1 and player -1. So if it is the turn of player -1 we can simply multiply the board with -1 and can make a move as it's player 1 turn.

We thought okay we can rotate the board instead but then the legal moves aren't legal anymore and it wasn't that easy to actually know who's turn it is and as I mentioned we didn't reinvent the wheel and the downside is that we didn't totally understand what is going on :D

The idea came after a couple of days on a long walk. We can simply save on our end whether the board is rotated and the framework always gets two parts of the board. The position and whether it is rotated. We didn't want to actually change the framework but it was possible to return the actual board matrix without the rotation variable at the right positions and the neural network could be changed without changing the framework to feed the network only with the position.

Another aspect of choosing a resizable game is that it's easy to test on small boards to check whether it works.

Now in tic-tac-toe we also know that we don't actually have 9 possible moves as the first player. Well we have but they aren't really different. We can move in the center in a corner or in the middle of a border row or column which gives us actually only three different choices.

This is game symmetry and it speeds up learning of the neural network as well if we give provide the symmetries beforehand. Every time we have a winning position we can rotate the board and say this is winning as well. In Nash we have a symmetry of a 180 degree rotation.

I encourage you to test the framework for games you want to have an engine for and if you want check out our implementation which this time is on GitLab instead of my opensourc.es GitHub account. GitHub doesn't allow files which are bigger than 100MB and the training checkpoints of our game are slightly larger.

We are currently working on it and nothing really is pushed to master yet but this shouldn't be a problem.

Here is the link to the staging branch.

We were able to train the engine for 4x4 board and are working on 5x5 (hopefully will be able to increase this).

It would be really really nice if someone of you has a nice GPU or a TPU :D to speed things up ;)

If you want to participate please comment and you hopefully find everything you need in this readme

Have a good one!

This is one game I played against the engine (engine is playing blue). I think it's nice to show a game where I should have a winning strategy as player red but I am actually loosing here...

If you're interested in learn more about AI or learning in general... I suggest to read as many books as possible ;)

If you've enjoyed this and some other posts on my blog I would appreciate a small donation either via PayPal or Patreon whereas the latter is a monthly donation you can choose PayPal for a one time thing. For the monthly donation you'll be able to read a new post earlier than everyone else and it's easier to stay in contact.

Want to be updated? Consider subscribing and receiving a mail whenever a new post comes out.

Subscribe to RSS

OpenSourc.ES

AlphaZero for Nash

Nash