I was tempted to leave this page with just that, but I suppose I should write some more. I'm passionate about math, and I love programming for how it lets me play around with abstractions and algorithms.
Currently, my main focus is Machine Learning. My current role is Data Scientist, but I still do a fair amount of coding. Right now, the current research about deep learning on graphs
(ie: GCNs, GATs, R-GCNs)
is most interesting to me. However, I still try to keep up with current advances in NLP (ie: BERT, XLNet)
and Reinforcement Learning.
I did my Bachelor's at Queen's University in the Computing and Communications Option
of the Mathematics & Engineering program. Basically, it was a combination of Computer Engineering and Applied Mathematics. In the 4th year, I took some really cool courses,
such as: Stochastic Control, Information Theory, and Convex Optimization, and my capstone project was about Multi-Agent Reinforcement Learning! It's maybe a bit of a niche program, but a great one, full of awesome profs
and wickedly smart students. The skills I learnt there, I use on a daily basis.
For fun I like to:
Backcountry Ski
Rock Climb
Play Hockey
Kayak
And generally explore outdoors!
Riley
Work
Wawanesa
Aug 2018 - Present
Right now I'm working as a Data Scientist for Wawanesa. Wawanesa is a property and casualty insurance company.
Here I get to work on many cool problems such as, predicting customer retention, lifetime value, and fraud.
At the same time I get to work with many smart and awesome people!
Fantisserie
Jan 2017 - Oct 2018
Before Wawanesa, I worked for Fantisserie as a Software Dev. Fantisserie is a startup making a mobile game centered around fantasy football.
As such, it had a multiplayer aspect, which made the programming interesting. I did work on both the backend and the app itself. The backend
was mostly written in Java and had a MongoDB database behind it. The frontend was being made with Unity Game Engine.
Will fill out this section soon, wrote some points to hold myself to that promise!
Just MCTS v MCTS for now, currently working on an interface to play against and tweak the AI...
Development Objectives
To make something with Rust and WebAssembly, that benefits
from that choice
To play around with minimax and MCTS
Why Connect Four?
It's a zero-sum, turn based, perfect information game
Easy to code the game logic and UI, lets me focus on
the AI
Easy to see when the AI is not performing as well as
it should
I wanted a version with a tweakable AI to play against
Application Design
The game runs entirely in the browser. While ordinary JavaScript is responsible
for the input and rendering, the AI and the business
logic are executed by a WebAssembly library written
in Rust. This is a loose implementation of the MVC
pattern, with the model being handled by Rust and the
view and controller being handled by JS.
The reason Rust is beneficial here, is that running the AI is computationally
expensive (relative to typical web-apps). Ideally in
the future, threads for WebAssembly get supported by
all browsers which would allow another significant
speedup.
Another advantage of this design, is that the Rust library can be compiled
to run on the desktop. I've done this to make a tui
version (with a barebones ncurses interface), that
I use for some fun in my terminal. Also, this can be
used for running experiments to see how the AI performs
against itself as it's parameters are varied.
Minimax
The first algorithm used for the AI is the
minimax algorithm, depth limited and with alpha-beta pruning.
Minimax is a tree search algorithm that can be applied to zero-sum, turn
based, perfect information games. The basic premise
behind minimax is to make your moves to minimize
the maximum score possible for your opponent.
Minimax uses the following formula, to define the "value"
of a particular game state:
\(f\) is a function that returns the resulting game state after applying
the action \(a_2\) followed by \(a_1\) to \(s\)
Player 1 will try to pick the action that leads to the state that maximizes
this value. However, to calculate this they would need
to know all the future possible scores: \(v_1(f(s,
a_2, a_1))\). These can be calculated recursively,
using minimax! Just expand the search tree till the
end states are reached, including all possible game
sequences, and propagate who wins when, back up the
tree...
1
Evidently, minimax with no modifications is a brute force algorithm that
becomes computationally infeasible for games that go
on for more than a few turns. One workaround is to
stop the search from looking forward beyond a predetermined
number of moves. The game states that reach that cut
off are then scored using some heuristic strategy.
The first one I used was to play the game out randomly
from that point a few times and take the average the
results. I talk in more detail about possible heuristics
in the
Rollouts section. Using minimax
with a heuristic will give results at least (and usually
better) as good as using that heuristic directly.2 Though the effectiveness of the algorithm is highly
dependant on the quality of the heuristic.
Another workaround is to use what's called
alpha beta pruning.
It's a method of labelling some game states as irrelevant
to optimal play, while the search is still going. These
states (and any following states that come from them)
can then be skipped.
Notes
For connect four there are 4,531,985,219,092 possible
board states! Though with some clever techniques
it actually is possible to brute force, and was
done so in 1995, see: John's Connect Four Playground.
This is probably not mathematically true, though I think you'd have to
go out of your way to design an exception
MCTS
Another option for the AI is to use the
Monte Carlo Tree Search
(MCTS) algorithm. MCTS
is an algorithm that has been used in some pretty modern
AI systems. For example MCTS is a key part of AlphaGo.1 Despite this, it is not super difficult to understand.
The two key ideals of MCTS are:
We can estimate the value of a game state by using
randomness (hence the Monte Carlo in the name)
We'd like to focus our search efforts deeply down paths
which we think are optimal/close to optimal while
still exploring the other paths sufficiently enough
to have confidence they aren't optimal
There are 4 main stages to the algorithm: selection, expansion, rollout,
and backpropagation. Starting from the current game
state (root) we build a tree of possible future states
(nodes) with value estimates, then iteratively expand
the tree and improve the value estimates while looping
over the four stages. We can repeat this loop for as
long as we want, the longer the better our AI will
be. This tree has a similar assumption built in as
the minimax tree: we predict our opponent's future
best moves at the same time, and assuming they will
play how we think is optimal.
Selection
In the selection phase, we find the node which is most
interesting for us to explore. This is the stage where
we have to contend with the exploration vs exploitation
tradeoff (a famous concept from Reinforcement Learning).
In short, focusing too much on the nodes we think are
good, will leave us potentially open to missing better
move sequences (or labelling a move sequence as bad
when it's actually good). On the other hand exploring
too much, wastes resources we could be using to look
further ahead, down the more promising move sequences.
There are a few ways to do this, but I used the
UTC algorithm. When doing selection we start at the root,
choose the child node which maximizes the following
expression and repeat the process until we reach a
node that we haven't searched beyond yet:
Player \(i\) is the player that would make the move
that transitions the game from the parent node
to the child node
\(v_{i, child}\) is the average score for player \(i\)
when visiting the child node (in other terms, estimated
value function of player \(i\))
\(c\) is a constant that controls the amount of exploration,
usually set to \(\sqrt{2}\)
\(n_{parent}\) is the number of times the parent node
(the one we are selecting the next move from) has
been visited during the search
\(n_{child}\) is the number of times the child node
has been visited during the search
Intuitively, this expression is the observed value of the child node with
a modifier. The modifier is large if \(n_{child}\)
is low. It also increases the more times we have the
option to choose that child node, but do something
else instead.
Expansion
Here we just make a new child node (or multiple and pick
one) from the one found in the selection phase, picking
a random legal move. If either the selected node or
the newly created child node is a end state of the
game, we skip to the backprop stage and send the result
back up the tree.
Rollout
Randomly play out the game (or do something more sophisticated)
and get the result/score. I'll go over some of the
things I tried in the
Rollouts section.
Apply the result to the score values of all the nodes visited
along the way to the rollout node. For my use case
'apply' meant add, as I then used the number of visits
to calculate the expected value of the state. Also
increment the number of times they've been visited
(for calculating UTC). One tricky thing here
3 is that the
value of the state should be should be in the perspective
of the player that would make the move to reach that
state. In a more general scenario it may be necessary
to simultaneously store & apply all the player's values
of every state visited.
Notes
Though the main breakthrough was to use ConvNets trained
with Deep RL for the rollout.
No, not the same backprop as the one that trains neural
nets, though maybe conceptually similar minus the
chain-rule.
This had me confused for an embarrassingly long time,
trying to figure out why my AI seemed so stupid.