In 1997, it shocked the public that a supercomputer could beat the chess world champion. In 2018, it's even more shocking that Kasparov held his own, achieving a win and several draws, against a supercomputer capable of evaluating more than 200,000,000 positions per second. This is absolutely astonishing to me, and difficult to even comprehend.
Inspired by this reading, I wrote an AI to play a much simpler game, tic-tac-toe. Unlike with chess, the game tree of tic-tac-toe can be explored fully in a reasonable amount of time because the game's state space is many orders of magnitude smaller. Heuristic rules for perfect play are straightforward to grasp, and perfect play always results in a draw. My interest in tic-tac-toe has less to do with the game itself, and more in the game as a simple testbed for general 2-player perfect-information games like chess, the more complex of which are unlikely to ever be solved. Interestingly, this leaves room for technical and creative innovation in AI design, as evidenced by the recent dramatic overtaking of StockFish by AlphaZero in computer chess, which is radically different in design than its contemporaries.
My tic-tac-toe program uses random playouts to evaluate possible moves. A value matrix is incremented if the random playout results in victory, decremented if a loss, and unchanged if a draw. On my 2011 Dell E6420 laptop, the program executes about 700 playouts per second, which is very modest. Nonetheless, this is sufficient for the program to loosely deduce many higher-level game heuristics, such as:
- It's best to start play at the center, or secondarily the corners
- A winning move out-ranks all other moves
- If the opponent is one move away from winning, they must be blocked
- Playing on the non-corner edges is weak unless blocking or winning
Most games result in a draw:
Here, the computer capitalizes on my second-move blunder to create a fork, achieving a victory:
Here, the computer fails to block, allowing me to win.
I probed this error on move 6 further by giving the computer a full 30 seconds to think, instead of the 2.5 seconds shown above. The value matrix came back as:
NaN NaN NaN
-8907 NaN -4317
NaN -4442 -8824
indicating that it considered the correct move, the block, a toss-up with its blunder move. Replaying the scenario repeatedly for debugging purposes, I noted that the computer chose to block about 50% of the time. Working through the game tree, it became clear why the computer blunders: the blocking move results in zero paths for O to win and one path for X to win, while the blunder move results in one path for O to win and two paths for X to win. The net score between the moves is equal at -1, so the computer essentially flips a coin in deciding between them.
So this is not a bug, but rather a limitation of the random playout valuation method in this particular game. It's easy enough to hard-code the computer to block, but it's much more interesting to see what the computer can and cannot work out on its own with a primitive valuation function. The short answer seems to be: a lot, but not everything.
Full source code can be found on my GitHub:
https://github.com/kindofdoon/tic_tac_toe
More games can be seen in this video:
No comments:
Post a Comment