Saturday, July 7, 2018

Sketch: Yellowstone River bridge

A sketch I made during a recent trip to Yellowstone National Park in about 1.5 hours, near near 44.920112, -110.406099. The air was thin and the sun was hot. At the time, I was recovering from tendinitis incurred at the Bozeman Triathlon a few days prior and was using a walking stick to cope. I stuck close to the road and kept an eye out for bears and buffalo. As far as I know, this bridge is unnamed but spans the Yellowstone River.

Sunday, June 10, 2018

Grayscale Image Encryption


A few months ago, I listened to a discussion between philosopher Sam Harris and technologist Tristan Harris that I found extremely interesting and persuasive. Tristan Harris has emerged as an outspoken critic of the "attention economy" that underpins many digital services which are offered for free with ad support, most notably Facebook. In essence, he argues that the relationship between human attention and profit that underpins free digital services creates a unhealthy incentive structure that rewards high-impact, shocking, emotional content and attenuates slow-burning, moderate, thoughtful content. My own sense of this disturbing trend led me to leave Facebook shortly after the 2016 presidential election, during which I became frustrated by the prevalence of low-quality, sensationalist content. Tristan Harris is also critical of how modern apps, in an effort to command and maintain attention, use bright, high-contrast, saturated colors that draw upon biases in human perception to "hijack" our attention. One counter-measure he proposes is setting one's phone to grayscale.

I've had my phone in grayscale for a few months now and strongly support the idea. It has helped me spend less time on my phone, and more time on hobbies I find more valuable. The grayscale makes my phone less visually interesting, and it also reminds me subconsciously that if I'm looking at my phone out of boredom, there are better things I can be doing with my time.


On a particularly unpleasant bus ride recently, I was thinking about digital grayscale image conversion. Any given grayscale value, 0-255, maps to a large number of colors. For example, it's possible to find a particular shade of red, green, blue, yellow, orange, etc. that maps to the same shade of gray in a grayscale conversion process.

This got me thinking - what would a color image look like if it was constrained such that after grayscale conversion, it flattened to a single shade of gray? In other words, suppose you wanted to "encrypt" an image in color such that an agent viewing the image in grayscale would see no image at all, but rather a single shade of gray?

Grayscale Conversion

There is no single way of performing digital grayscale conversion, but rather a variety of techniques. If each pixel's color is represented as a 1-by-3 array, with the elements corresponding to its red, green, and blue components (R, G, B), then a few common grayscale conversion methods are:

     • Lightness: average the most and least prominent colors
          ◦ value = (max(R,G,B) + min(R,G,B)) / 2
     • Average: average the colors
          ◦ value = sum(R,G,B) / 3, or equivalently...
          ◦ value = 0.33*R + 0.33*G + 0.33*B
     • Luminosity: weighted average of the colors
          ◦ value = 0.21*R + 0.72*G + 0.07*B

The average and luminosity methods are essentially identical, with the only difference being the weights given to the color channels. I decided to focus on these two methods for being better-suited for the optimization approach discussed below.

Geometric Approach

For an image to satisfy my encryption criteria and flatten to a single color in grayscale, each pixel in the image must satisfy the constraint equation:

K = 0.21*R + 0.72*G + 0.07*B

where K, somewhat arbitrarily, is the average value of all color channels, and [0.21 0.72 0.07] represents the weightings for each color channel. After some thought, I realized that this problem could be approached geometrically by considering the color channel variables R, G, B as displacements in 3D space, i.e. x, y, z. Viewed in this way, the constraint equation actually describes a tilted plane in 3D space. The plane's specifications depend on the K-value from the input image and the weights from the grayscale conversion method. Here are some representative visualizations:

With the color channels equally weighted:

Favoring the red color channel:

Favoring the green color channel:

Transforming a pixel can be thought of as projecting it onto the constraint plane. This is equivalent to picking the closest point on the constraint plane, but much faster. The projection is done by calculating the input pixel's distance to the constraint plane, then subtracting this distance along the constraint plane's unit normal vector. It's worth noting that after projection, some pixels may lie outside the bounding box of [0 1] in x, y, z. I haven't thought of a good way of efficiently snapping these points to their best location on the constraint plane. However, for most ordinary photographic images I've analyzed, the K-value is around 0.5 and all pixels can be projected without error.


Shown below are some results with varying color channel weights. I tested several standard weightings and some non-standard weightings as well. None of these weightings exactly flattened the images in grayscale on my Android phone, but some came pretty close - [0.21 0.72 0.07] seemed to provide the best result.

Standard weightings:

Non-standard weightings:


This project yielded some useful knowledge, primarily that visualizing color intensity as displacement in 3D space provides geometric intuition and optimization improvements. However, the idea doesn't have any apparent uses other than as an interesting toy problem for image processing, nor does it seems feasible since "encrypting" the image would require prior knowledge of the grayscale conversion algorithm being used, of which there are many.

Source Code

Available on my GitHub.

Sunday, June 3, 2018

Swim the Charles!

Yesterday I raced in the Charles River One-Mile Swim, setting a new personal record with a time of 29:48. The water was 68-69 °F, just right for my 2-5 mm wetsuit. The water was surprisingly opaque - looking down in the water, I couldn't see my feet! Visually, it was like swimming in dark beer. In a way, it was pretty when the sun shined through splashed water, illuminating its deep golden-brown color. It's a bit like how urban haze can make sunsets more spectacular. The race organizers closely monitored water quality leading up to the event to ensure that levels of pollutants and bacteria were sufficiently low. The main issue is that rainfall causes runoff, temporarily degrading water quality. With no significant rain preceding the event, we were good to go.

crowd starting to trickle in

post-race merriment

coat hanger-like course shape

Here's how the results stacked up:

The race was staggered into two waves based on estimated finish time. I was placed into the first wave. Finishing with a very average time, I was slower than most in the first wave and faster than most in the second wave.

Where did people come from? Primarily Cambridge, Boston, Somerville, Brookline, and Belmont, in that order.

The biggest surprise I found in the data was how closely a person's bib number correlated to their race time. The trend is very clear in the plot above. In fact, the data indicates that a person's bib number is a significantly better predictor of their performance than any other factor, including their age and how far they traveled to the race. How interesting!

Looking to dig deeper, I performed a sexually bi-modal multi-linear regression for finish time based on age, bib number, and distance traveled to the race from each athlete's hometown using Google's Geocoding API. The error between predicted and actual times had a mean of about 0 minutes and standard deviation of about 3.9 minutes, which is not bad considering the sparseness of the data. I then ranked the results by "surprising-ness", taking the normalized regression error as a metric for how surprising a performance was:

Top 10 Surprising Performances
RankNameAge, yearsStateGenderTime, min.Reg. error, norm., %
1Kirkham Wood63CAM24.7126.4
2Rafael Irizarry46MAM26.3821.8
3Don Haut52MAM23.6220.1
4Donald Kaiser67MAM27.6420.0
5Darryl Starr49RIM24.0219.1
6Ursula Hester47COF29.7718.4
7Len Van Greuning50MAM22.1818.1
8Alex Meyer29MAM18.8117.9
9Louis Harwood28MAM29.9916.9
10Haruka Uchida22MAF23.7416.2

This is neat, but it's easily gamed, as a person could increase their "surprising-ness" significantly by registering at the last minute, making them appear statistically as more of an underdog, for example. To this end, I also characterized pure athletic performance by excluding bib number and hometown data, looking only at gender and age, which increased the error standard deviation to 5.6 minutes. In effect, this is similar to awarding prizes by division, but in a continuous sense rather than a discrete one with arbitrarily-defined brackets such as "Men 30-39".

Top 10 Athletic Performances
RankNameAge, yearsStateGenderTime, min.Reg. err., norm., %
1Alex Meyer29MAM18.8131.70
2Eric Nilsson30MAM18.9331.55
3Anton McKee24MAM19.7726.62
4Jessica Stokes41MAF22.1826.61
5Jen Olsen47MAF22.6326.39
6Len Van Greuning50MAM22.1826.14
7Kathleen Tetreault56MAF23.5225.39
8Gail Fricano44MAF22.7725.31
9Christophe Graefe44MAM22.0924.65
10Ed Baker39MAM21.9123.70

A major aspect of this swim is changing public opinion on the feasibility of the Charles as a public recreation area for swimming. A great deal of work has been to study, improve, and monitor its water quality. A 2016 report said that the prospect of a permanent swimming facility on the Charles "could potentially be feasible" and included this tantalizing photo below. It's a very intriguing possibility. Getting the river to a point where urban swimming is once again feasible would have significant benefits aside from enabling the creation of a forward-looking, yet retro swimming site.

artist's rendering of an urban swimming site on the Charles

Thursday, March 15, 2018

Predicting March Madness with Elo ratings and Monte Carlo simulations

I was recently invited to join a March Madness bracket pool at my office. I quickly realized that this was a great opportunity to apply some of the concepts that I learned about in reading Kasparov's recent book Deep Thinking, discussed in my previous post. My original idea was to apply Elo ratings, originally developed for chess, to inform my bracket. The scope of this effort quickly grew as I realized that I could wrap additional layers of abstraction around an Elo calculator, incorporating Monte Carlo simulations and sensitivity analysis, to make a fully-automated bracket generator. In this post, I'll discuss my approach, method, and results.

About Elo ratings

Elo ratings were originally created for the purpose of quantifying a player's skill level in chess over time relative to their peers. The Elo rating system has many nice properties, such as the ability to predict game outcomes based on prior skill levels, and its drop-in/drop-out asynchronous nature. For its simplicity and power, the rating scale has found use in many sports beyond chess. You can read about the math behind it on Wikipedia.

Graph showing relationship between Elo difference and win probability; source

Elo vs. NCAA RPI

In ranking its teams, the NCAA uses a ranking system called RPI, or rating percentage index, which is an oddly arbitrary system. Historically, it seems to have good predictive power, but this may be somewhat deceiving. The act of receiving a high seed seems to be inherently advantageous regardless of team skill due to the bracket structure, 
as Jon Bois humorously shows:

The primary free parameter in the Elo rating system is its so-called K factor, which controls the speed at which ratings respond to apparent increases in player skill. In comparison, the RPI system has three weighting parameters, which are more or less arbitrary, and whose meaning is less obvious in a higher-level sense. As Wikipedia puts it, RPI "lacks theoretical justification from a statistical standpoint" and the "heavy emphasis upon strength of schedule [may give] an unfair advantage to teams from major conferences". My analysis supports this criticism of the RPI system - more on this below. More critically for this effort, it also has limited predictive power, so in my model, I use Elo exclusively and neglect NCAA ranking except as it influences the initial bracket structure.

Generating Elo ratings

Elo ratings are generated by examining win/loss records. I decided to limit my scope to only the 2017-2018 season, neglecting any prior data, to maintain simplicity and avoid the additional modeling assumptions that would be required.

I scraped a complete game log for all of the active teams and parsed them to remove duplicates and non-D1 games. In total, the scraper extracted 5,401 unique games between 351 D1 teams spanning 2017-11-10 to 2018-03-10. I also manually added the play-in "First Four" games, which are effectively regular-season games. For those interested, this dataset is available on my GitHub.

With a log of all unique games, calculating Elo ratings is straightforward. Teams are initialized with an average score, and their score is adjusted as they win and lose games according to Elo's equations. In essence, the equations predict the probability of each team winning based on their rating difference, then compare the actual result to the prediction, then update the teams' ratings accordingly. Its simple and self-correcting nature reminds me of Bayes' Theorem, another useful statistical tool for making predictions based on observations.

Elo over 2017-2018 season plus First Four, selected teams highlighted

Comparing Elo and NCAA rankings

Elo provides a check against the NCAA ranking system. Which teams are most overrated? Most underrated? Here is what the data says about the NCAA's top 25 teams:

Green: underrated; red: overrated, per Elo assessment
For Elo, K = 25 and mean = 1500; NCAA source: Google

Interestingly, the comparison produces a jarring collection of strong agreements and strong disagreements. Most overrated is Florida, ranked at #23 with a 20-12 record. Most underrated are Houston and Saint Mary's (CA), ranked #21 and #25, with records of 26-7 and and 28-5, respectively. The systems agree that Virginia and Villanova are the strongest teams. Here are my top 25 teams sorted by regular-season Elo:

Notably, Saint Mary's (CA), ranked #16 in Elo, did not qualify for the Round of 64 - a victim of the NCAA's archaic ranking system.

Determining postseason probabilities

Elo's equations combined with teams' ratings can be used to calculate the probability of any team beating any other team. 
For example, in the first game, Virginia, rated 1761, plays UMBC, rated 1604. Per Elo, Virginia's win probability is:

[1+10^((1604-1761)/400)]^-1 = 71.2%

So Virginia is most likely to win. At this point, a call to a random number generator bounded on 0 to 1 can be used to simulate a game. If the result is < 0.712, Virginia wins, otherwise, UMBC wins. So Virginia is most likely to advance here, but we also can't rule out an upset either. How to track both outcomes? For this, I turned to Monte Carlo simulations. This allows us to re-run this game, and all downstream games, an arbitrarily large amount of times until a coherent image of overall probability emerges. The result is the probability of any team reaching any round, encompassing both team skill and bracket structure. FiveThirtyEight has done a typically excellent job of visualizing this. My own visualization of the same data is shown below, though our models and probabilities differ.

As a clerical note
, I experimented with two approaches for dealing with Elo ratings after simulated games: "static" Elo, where the ratings are unchanged after games, and "dynamic" Elo, where the ratings are updated based on the simulated outcome. The difference between the two ended up being fairly negligible - more on this later.

Incorporation of Monte Carlo simulations

Each game in March Madness can have two possible outcomes, and there are 32+16+8+4+2+1 = 63 games in total. So the total number of brackets is 2^63, or 9.22 x 10^18. This enormous state space makes exhaustive analysis impossible, and also explains why Warren Buffett has not yet had to pay out his jackpot prize for a perfect bracket, and is not likely to. However, exhaustive analysis, even if it was possible, is not necessary - Monte Carlo simulations provide a sufficiently accurate approximation.

My program operates at a speed of about 5,000 tournaments per second on a Dell Precision 5520. It monitors the maximum probability of any team becoming champion, as a figure of merit, and considers the probabilities converged when this value changes less than 0.025% between 5-second intervals. This takes around 100,000 tournaments, or 20 seconds.

Visualizing probabilities

Click to enlarge - Elo K: 25, Elo mean: 1500, dynamic Elo in postseason

Colors: green denotes above-average probability for that round; red is below-average 

The probabilities are then sorted according to each team's probability of winning the championship - more why the probabilities are sorted this way below. The low probability of the most likely outcomes - e.g. the strongest teams have only a 5-10% of winning the championship - makes clear how difficult the outcome is to predict. The ultimate odds are relatively slim for all but the top teams, and absolutely slim for all teams.

Down-selecting to a single bracket

In moving from probabilities to a final bracket, I considered how the bracket will be scored and more generally, what the ultimate goal is. In my case, my goal is to maximize my bracket's score, as scored by ESPN, so I can win the office pool. ESPN's rules indicate:
  • Round of 32: 10 points per pick
  • Round of 16: 20 points per pick
  • Round of 8: 40 points per pick
  • Round of 4: 80 points per pick
  • Round of 2: 160 points per pick
  • Round of 1: 320 points per pick
This scoring system results in the optimal strategy being a bit counter-intuitive. Suppose every game is decided at random with a coin toss. The probability of correctly selecting the champion, worth 320 points, is 1/64, or 1.6%. One round upstream, the probability of correctly selecting both teams in the Round of 2, also worth 320 points, is (1/32)^2, or 0.1%. The percentages get progressively worse further upstream. In short, the expected point yield gets worse the further upstream you target, so the optimal strategy is to target as far downstream as possible - in effect, to build your bracket in reverse chronological order.

For my bracket, I use overall probability of winning the championship as the metric to determine which teams advance. If the rounds were equally weighted, or equivalently, if I was competing for overall percentage, I would simply advance the team with the greater Elo rating in each game, and Monte Carlo simulation would not be necessary.

In summary, my bracket does not represent my best guess at the tournament's outcome, but rather a series of choices made to maximize score - an important distinction. 

Sensitivity analysis

To probe the sensitivity of my results to my modeling assumption, I wrapped an additional layer of abstraction around the bracket generator and modified three of the key model parameters, borrowing historically stable values from chess to guide parameter ranges:
  1. Elo mean - 1000 and 1500, the two standard values I've seen in literature
  2. Elo K value - 10, 25, and 40, per FIDE's standards
  3. Elo rating in playoffs - either "dynamic" or "static"
Since it wasn't obvious which of these models was best, I decided to analyze all permutations, 12 in total, and weigh them equally. In other words, each bracket gets a vote, and the majority vote wins.

In general, the models are in unanimous agreement about the best choice for most bracket positions, including 
from the Round of 4 onward, and with only a single dissenting vote in the Round of 8. Further upstream, minor disagreements creep in progressively. After extracting a consensus from the brackets, I surveyed which, if any, models produced a bracket that exactly matched the consensus bracket - in other words, which model was most "centrist". I found that my initial set of model parameters - Elo mean: 1500, Elo K: 25, dynamic postseason Elo - provided an exact match. So the sensitivity analysis confirms that these settings are reasonably stable.

My bracket

Bringing it all together, here is my bracket:

Source Code

Full source code and data logs, minus the game scraper, are available on my GitHub.


Update: April 4, 2018

March Madness is now complete. How did my bracket do?

Better than average globally, but not as well as I hoped. It scored 680 points out of a possible 1920, ranked #9 of 16 in the office pool, and was 63rd percentile per ESPN.

distribution of ESPN bracket scores from various pools

What does this mean for my bracket program? In short, it's hard to say. In the presence of March Madness' trademark unpredictability, a poor result does not necessarily indicate a poor decision-making process, nor does a good result indicate a good decision-making process.

This year, UMBC upset Virginia in the first round, the first time in tournament history that a #16 seed defeated a #1 seed in the postseason. My program predicted Virginia had a 71% chance to win; FiveThirtyEight had them at 98%; Ken Pomeroy 97%. Had Virginia instead won this match and continued to win the championship as I predicted, my bracket would have scored 1310, or 99th percentile. This is wishful thinking, of course, but it illustrates how sensitive the results are to flukes, upsets, injuries, or a team simply being "off" or "on".

More succinctly, The Harvard Sports Analysis Collective wrote in 2011:

"More generally, almost all prediction methods make the dubious assumption that NCAA Tournament games are the same as regular season games. That does not seem to hold true. NCAA Tournament games are played in bigger arenas, under brighter media spotlights, and with higher stakes than almost any regular season game."

When it comes to bracketology, I've learned that there seems to be no end to the number of metrics and methods that armchair statisticians like myself have devised in an attempt to crack this inscrutable problem. The next steps I see for this are refining the prediction method using historical regular season and tournament outcomes, and also evaluating other sports, particularly baseball, which has a longer regular- and post-season, and so is a better candidate for statistical analysis.