About Me

I like solving puzzles.

I was tempted to leave this page with just that, but I suppose I should write some more. I'm passionate about math, and I love programming for how it lets me play around with abstractions and algorithms.

Currently, my main focus is Machine Learning. My current role is Data Scientist, but I still do a fair amount of coding. Right now, the current research about deep learning on graphs (ie: GCNs, GATs, R-GCNs) is most interesting to me. However, I still try to keep up with current advances in NLP (ie: BERT, XLNet) and Reinforcement Learning.

I did my Bachelor's at Queen's University in the Computing and Communications Option of the Mathematics & Engineering program. Basically, it was a combination of Computer Engineering and Applied Mathematics. In the 4th year, I took some really cool courses, such as: Stochastic Control, Information Theory, and Convex Optimization, and my capstone project was about Multi-Agent Reinforcement Learning! It's maybe a bit of a niche program, but a great one, full of awesome profs and wickedly smart students. The skills I learnt there, I use on a daily basis.

For fun I like to:
  • Backcountry Ski
  • Rock Climb
  • Play Hockey
  • Kayak
  • And generally explore outdoors!

Riley

Work

Wawanesa

Aug 2018 - Present

Right now I'm working as a Data Scientist for Wawanesa. Wawanesa is a property and casualty insurance company. Here I get to work on many cool problems such as, predicting customer retention, lifetime value, and fraud. At the same time I get to work with many smart and awesome people!


Fantisserie

Jan 2017 - Oct 2018

Before Wawanesa, I worked for Fantisserie as a Software Dev. Fantisserie is a startup making a mobile game centered around fantasy football. As such, it had a multiplayer aspect, which made the programming interesting. I did work on both the backend and the app itself. The backend was mostly written in Java and had a MongoDB database behind it. The frontend was being made with Unity Game Engine.

Will fill out this section soon, wrote some points to hold myself to that promise!

Adventures


Alaska

KDD..

Denali..

Portage..

Harding..

Seward..


Bow Valley Parkway

Bow Summit..

Cirque Peak..


Grassy Hut

Intro..

Morning..

Powerline..

Trees + sunset..

Finding the hut..

The next day..

Connect 4

Minimax & MCTS in Rust

Ready..


                    

Just MCTS v MCTS for now, currently working on an interface to play against and tweak the AI...


Development Objectives

  • To make something with Rust and WebAssembly, that benefits from that choice
  • To play around with minimax and MCTS

Why Connect Four?

  • It's a zero-sum, turn based, perfect information game
  • Easy to code the game logic and UI, lets me focus on the AI
  • Easy to see when the AI is not performing as well as it should
  • I wanted a version with a tweakable AI to play against

Application Design

The game runs entirely in the browser. While ordinary JavaScript is responsible for the input and rendering, the AI and the business logic are executed by a WebAssembly library written in Rust. This is a loose implementation of the MVC pattern, with the model being handled by Rust and the view and controller being handled by JS.

The reason Rust is beneficial here, is that running the AI is computationally expensive (relative to typical web-apps). Ideally in the future, threads for WebAssembly get supported by all browsers which would allow another significant speedup.

Another advantage of this design, is that the Rust library can be compiled to run on the desktop. I've done this to make a tui version (with a barebones ncurses interface), that I use for some fun in my terminal. Also, this can be used for running experiments to see how the AI performs against itself as it's parameters are varied.

Hard at work?
Coding MCTS while playing against minimax

Minimax

The first algorithm used for the AI is the minimax algorithm, depth limited and with alpha-beta pruning.

Minimax is a tree search algorithm that can be applied to zero-sum, turn based, perfect information games. The basic premise behind minimax is to make your moves to minimize the maximum score possible for your opponent. Minimax uses the following formula, to define the "value" of a particular game state:

\[v_1(s) = \underset{a_2}{\operatorname{min}}(\underset{a_1}{\operatorname{max}}(v_1(f(s, a_2, a_1))))\] Where:
  • \(v_{1}\) is the value function of player 1
  • \(s\) is the game state that is being evaluated
  • \(a_i\) is the action of player \(i\)
  • \(f\) is a function that returns the resulting game state after applying the action \(a_2\) followed by \(a_1\) to \(s\)

Player 1 will try to pick the action that leads to the state that maximizes this value. However, to calculate this they would need to know all the future possible scores: \(v_1(f(s, a_2, a_1))\). These can be calculated recursively, using minimax! Just expand the search tree till the end states are reached, including all possible game sequences, and propagate who wins when, back up the tree... 1

Evidently, minimax with no modifications is a brute force algorithm that becomes computationally infeasible for games that go on for more than a few turns. One workaround is to stop the search from looking forward beyond a predetermined number of moves. The game states that reach that cut off are then scored using some heuristic strategy. The first one I used was to play the game out randomly from that point a few times and take the average the results. I talk in more detail about possible heuristics in the Rollouts section. Using minimax with a heuristic will give results at least (and usually better) as good as using that heuristic directly.2 Though the effectiveness of the algorithm is highly dependant on the quality of the heuristic.

Another workaround is to use what's called alpha beta pruning. It's a method of labelling some game states as irrelevant to optimal play, while the search is still going. These states (and any following states that come from them) can then be skipped.

Notes

  1. For connect four there are 4,531,985,219,092 possible board states! Though with some clever techniques it actually is possible to brute force, and was done so in 1995, see: John's Connect Four Playground.
  2. This is probably not mathematically true, though I think you'd have to go out of your way to design an exception

MCTS

Another option for the AI is to use the Monte Carlo Tree Search (MCTS) algorithm. MCTS is an algorithm that has been used in some pretty modern AI systems. For example MCTS is a key part of AlphaGo.1 Despite this, it is not super difficult to understand.

The two key ideals of MCTS are:
  • We can estimate the value of a game state by using randomness (hence the Monte Carlo in the name)
  • We'd like to focus our search efforts deeply down paths which we think are optimal/close to optimal while still exploring the other paths sufficiently enough to have confidence they aren't optimal

There are 4 main stages to the algorithm: selection, expansion, rollout, and backpropagation. Starting from the current game state (root) we build a tree of possible future states (nodes) with value estimates, then iteratively expand the tree and improve the value estimates while looping over the four stages. We can repeat this loop for as long as we want, the longer the better our AI will be. This tree has a similar assumption built in as the minimax tree: we predict our opponent's future best moves at the same time, and assuming they will play how we think is optimal.

Selection

In the selection phase, we find the node which is most interesting for us to explore. This is the stage where we have to contend with the exploration vs exploitation tradeoff (a famous concept from Reinforcement Learning). In short, focusing too much on the nodes we think are good, will leave us potentially open to missing better move sequences (or labelling a move sequence as bad when it's actually good). On the other hand exploring too much, wastes resources we could be using to look further ahead, down the more promising move sequences. There are a few ways to do this, but I used the UTC algorithm. When doing selection we start at the root, choose the child node which maximizes the following expression and repeat the process until we reach a node that we haven't searched beyond yet:

\[v_{i, child} + c\sqrt{\dfrac{\ln(n_{parent})}{n_{child}}}\] Where:
  • Player \(i\) is the player that would make the move that transitions the game from the parent node to the child node
  • \(v_{i, child}\) is the average score for player \(i\) when visiting the child node (in other terms, estimated value function of player \(i\))
  • \(c\) is a constant that controls the amount of exploration, usually set to \(\sqrt{2}\)
  • \(n_{parent}\) is the number of times the parent node (the one we are selecting the next move from) has been visited during the search
  • \(n_{child}\) is the number of times the child node has been visited during the search

Intuitively, this expression is the observed value of the child node with a modifier. The modifier is large if \(n_{child}\) is low. It also increases the more times we have the option to choose that child node, but do something else instead.

Expansion

Here we just make a new child node (or multiple and pick one) from the one found in the selection phase, picking a random legal move. If either the selected node or the newly created child node is a end state of the game, we skip to the backprop stage and send the result back up the tree.

Rollout

Randomly play out the game (or do something more sophisticated) and get the result/score. I'll go over some of the things I tried in the Rollouts section.

Backpropagation2

Apply the result to the score values of all the nodes visited along the way to the rollout node. For my use case 'apply' meant add, as I then used the number of visits to calculate the expected value of the state. Also increment the number of times they've been visited (for calculating UTC). One tricky thing here 3 is that the value of the state should be should be in the perspective of the player that would make the move to reach that state. In a more general scenario it may be necessary to simultaneously store & apply all the player's values of every state visited.

Notes

  1. Though the main breakthrough was to use ConvNets trained with Deep RL for the rollout.
  2. No, not the same backprop as the one that trains neural nets, though maybe conceptually similar minus the chain-rule.
  3. This had me confused for an embarrassingly long time, trying to figure out why my AI seemed so stupid.

Rollout Functions

Coming soon...


Evaluating the AI

Coming soon...


🦀 🕸 🦀 🕸 🦀 🕸 🦀