Sunday, February 25, 2018

Tic-Tac-Toe, and Musings About Kasparov/Deep Blue

I'm currently reading Deep Thinking by Garry Kasparov, best known for his historic series of chess games against IBM's Deep Blue supercomputer in 1996 and 1997. These games are considered to be a collective watershed moment in the history of man/machine interaction for demonstrating that computers had finally exceeded human skill in the game of chess. In his book, Kasparov grieves that Deep Blue's victory was not the result of a scientific breakthrough about intelligence, but was rather an inevitability given the predictable relationship between computation speed and chess Elo rating, combined with Moore's Law. Kasparov's loss is fascinating for both its historical importance and its contemporary relevance, as computers continue reaching into increasingly "human" domains, such as driving cars, writing stories, diagnosing illnesses, and composing music.

In 1997, it shocked the public that a supercomputer could beat the chess world champion. In 2018, it's even more shocking that Kasparov held his own, achieving a win and several draws, against a supercomputer capable of evaluating more than 200,000,000 positions per second. This is absolutely astonishing to me, and difficult to even comprehend.

Inspired by this reading, I wrote an AI to play a much simpler game, tic-tac-toe. Unlike with chess, the game tree of tic-tac-toe can be explored fully in a reasonable amount of time because the game's state space is many orders of magnitude smaller. Heuristic rules for perfect play are straightforward to grasp, and perfect play always results in a draw. My interest in tic-tac-toe has less to do with the game itself, and more in the game as a simple testbed for general 2-player perfect-information games like chess, the more complex of which are unlikely to ever be solved. Interestingly, this leaves room for technical and creative innovation in AI design, as evidenced by the recent dramatic overtaking of StockFish by AlphaZero in computer chess, which is radically different in design than its contemporaries.

My tic-tac-toe program uses random playouts to evaluate possible moves. A value matrix is incremented if the random playout results in victory, decremented if a loss, and unchanged if a draw. On my 2011 Dell E6420 laptop, the program executes about 700 playouts per second, which is very modest. Nonetheless, this is sufficient for the program to loosely deduce many higher-level game heuristics, such as:

  1. It's best to start play at the center, or secondarily the corners 
  2. A winning move out-ranks all other moves 
  3. If the opponent is one move away from winning, they must be blocked 
  4. Playing on the non-corner edges is weak unless blocking or winning 
To visualize the computer's "thoughts", I displayed the computer's value matrix alongside the game board - red for the worst move, green for the best, and gray for occupied spaces. It's fascinating to watch the computer's "thoughts" develop in a way that seems to mirror my own while playing in a much more casual fashion.

Most games result in a draw:

Here, the computer capitalizes on my second-move blunder to create a fork, achieving a victory:

Here, the computer fails to block, allowing me to win.

I probed this error on move 6 further by giving the computer a full 30 seconds to think, instead of the 2.5 seconds shown above. The value matrix came back as:

  NaN   NaN   NaN
-8907   NaN -4317
  NaN -4442 -8824

indicating that it considered the correct move, the block, a toss-up with its blunder move. Replaying the scenario repeatedly for debugging purposes, I noted that the computer chose to block about 50% of the time. Working through the game tree, it became clear why the computer blunders: the blocking move results in zero paths for O to win and one path for X to win, while the blunder move results in one path for O to win and two paths for X to win. The net score between the moves is equal at -1, so the computer essentially flips a coin in deciding between them.

So this is not a bug, but rather a limitation of the random playout valuation method in this particular game. It's easy enough to hard-code the computer to block, but it's much more interesting to see what the computer can and cannot work out on its own with a primitive valuation function. The short answer seems to be: a lot, but not everything.

Full source code can be found on my GitHub:

More games can be seen in this video:

No comments:

Post a Comment