Friday, August 25, 2017

Optimizing Softball Fielding Positions

In April 2017, I signed up for a coed recreation softball league through HubSports, which runs several recreational sports leagues throughout the greater Boston area. When signing up online, I checked a box indicating that I would be willing to serve as team captain for whatever team the organizers put together. A few days later, the teams were arranged and I was selected as the captain of the "Danehy Dozen".

One of the first things I did was send out a survey to my teammates asking for everyone's preferences on fielding positions. There was a lot of variety—some people only liked a couple positions, some people liked playing only infield or outfield, and some had no preferences at all. I quickly realized that with 10 fielding positions—our league plays with 4 outfielders—and ~13 players, the number of possible fielding lineups was immense at 13!/3!, or more than 1 billion! I reasoned that I could save time and improve player satisfaction by writing a program to determine optimal fielding lineups for maximum player happiness, based on the survey results.

The first implementation of the program, written in MATLAB, worked like this:

  1. Generate a random lineup.
  2. If fewer than 3 women are on the field, discard the lineup and start again from step 1. This is a coed league requirement to avoid penalties.
  3. Score the lineup, giving 1 point if a player likes their position, 0 if they are neutral, and -1 if they dislike it.
  4. If the lineup's score is equal to or greater than the current best lineup found, output it as text.
  5. Loop back to step 1.
This worked well, and generally after 10-15 seconds of searching, the program stumbled upon an arrangement where everyone liked where they were playing. However, I noticed a problem: since the program sought only to maximize score, players who liked more positions were more likely to be placed in lineups, and conversely, players who were choosier were more likely to be benched. Since I also wanted to equalize play time, this was not an acceptable behavior.

To fix this, I altered the scoring step, focusing on individual player satisfaction—i.e., for this player in this position, are they as happy as they can be? I represented maximum happiness as the most-positive response they gave, as some players were neutral on everything. I also shifted from a -1 to 1 scale to a 1 to 3 scale, since happiness is like length in that it can't be negative. The new scoring step was:
  1. ...
  2. ...
  3. Score the lineup, giving 1/N points if the player dislikes their position, 2/N if they are neutral, and 3/N if they like it, where N represents the most-positive response that player gave on a 1-3 scale.
  4. ...
  5. ...
This new scoring method both maximized player happiness and equalized play time—a success! However, during our first game I realized there was still room for improvement when the program inadvertently placed newer players in high-pressure positions like shortstop and left field. I also began to realize that part of maximizing player happiness is maximizing the team's win probability, since everyone likes to win and people generally prefer to play positions where they are most likely to succeed.

To this end, I made one additional change. Prior to each game, I would open the cell array representing player preferences and update it with my own subjective assessments based on player performance in practices and games. I primarily used this step to steer the program away from putting players in positions where I thought they would not perform well based on my assessment. As the season progressed and I refined my ideas of who would play best where, a stability emerged from our lineups where players tended to play the same 1-2 positions, which they liked and were well-suited in terms of skills to play.

We ended up finishing 3rd out of 5 teams in our league during the regular season, with a win-loss record of 3-3. The real magic happened during the playoffs, where we unseated the #2 and #1 teams in a doubleheader to claim the championship! Overall I was very pleased with this optimization approach. I think it helped build team spirit by generating fair and successful lineups, leading to player satisfaction and the emergence of a team identity, a virtuous cycle that continues into our second season together.


The Danehy Dozen, 2017 champions of HubSports' Danehy Monday night spring league

Wednesday, August 23, 2017

Digital Un-Varnishing

Update 2022-02-04: This article has been superseded by "Oil Painting Color Calibration".

A post-processing algorithm for improved color accuracy in photos of paintings

Taking good pictures of oil paintings is surprisingly challenging. When I photograph paintings, my goal is to take pictures that are sufficiently accurate and lifelike that my monitor/screen appears to be the painting itself, rather than a picture of the painting. Improving the hardware of your photography setup - lighting, camera, lenses - will cause the most significant improvements in this respect, and these hardware aspects are well-documented on the internet, so I won't go into them. Instead, my focus in this post is on the digital post-processing of pictures, and specifically on a technique and algorithm that I use in my personal work once I've taken the best picture I can with my setup.

The idea came to me
when I connected two observations:
  1. In painting restoration, one of the first steps a restorer will usually take is removing the discolored varnish from the surface of the painting with a solvent. By removing this brownish/yellowish translucent film, the painting looks much closer to the way it did when it left the artist's easel.
  2. Pictures of paintings, particularly with imperfect photography setups, often have a characteristic neutral-colored haze that looks much like discolored varnish, although the color of the haze is usually closer to the average color of the painting rather than the yellow-brown color of discolored varnish.

My idea was to somehow invent an algorithm that would digitally un-varnish my pictures. To get some physical intuition, shown below are examples of paintings before and after removal of discolored varnish. Removing this translucent film dramatically improves the apparent color vibrancy and value contrast of the painting to near their original levels.



Two paintings before and after restoration, source: cleanoilpainting.com

From an image processing perspective, the restorer is removing a translucent overlay from the original image, i.e. the paint itself. With an RGB color image, if the color of the overlay and its opacity are known, then the overlay can be digitally removed using simple arithmetic. First, consider the process in reverse, where a translucent overlay is intentionally added to an original image:

A neutral brown-gray, RGB [115 99 87], is overlaid at 35% opacity, source: author


The original image on the left has more contrast, color variation, and clarity than the image on the right. These are all desirable qualities for pictures of paintings.

From an image processing perspective, adding a translucent overlay is a simple operation: the color of each pixel in the new image is a weighted average of the color of that pixel in the original image and in the overlay, where the weights are derived from the the opacity of the overlay. The algorithm that I developed does the same process, but in reverse. The input is a hazy image that I want to improve. The assumption is that the input image is comprised of two superimposed images: the original image, and a translucent overlay. I have experimented with using two approaches for calculating the overlay image: 

  1. Calculate the average color of the input image, and set every pixel in the overlay image to this color.
  2. Perform a Gaussian blur on the input image, with the blur radius as a parameter to be tuned.
The first method is simpler and faster, but the second method sometimes yields better results. It's mostly a matter of taste and what best suits the input image. In the limit where the Gaussian blur radius approaches infinity, method 2 approaches method 1, although would also approach infinite calculation time. Once you have an overlay image to subtract, the only remaining step is to select an opacity - 15-30% seems to work best - and then the digitally un-varnished output image can be calculated.

I wrote a MATLAB function that uses a GUI for adjusting the two main parameters, the Gaussian blur radius and the overlay opacity. It also has functionality to zoom, pan, compare, and export. It also shows a real-time PDF (probability density function) and CDF (cumulative distribution function) for a grayscale version of the original and modified images to visualize changes to the color distribution. Here is a screenshot:


GUI in action, painting by Michaël Borremans

Basically, what this algorithm does to the image PDF and CDF is stretch them both horizontally so that the image covers a broader dynamic range while preserving the average brightness of the original image. If the opacity is set too high, the PDF and CDF can stretch to the extreme ends and saturation to white and/or black can occur.

Graph showing effect of increasing overlay opacity on image CDF

This graph shows the CDF for a particular painting as the overlay opacity is increased. Note that black saturation, color intensity = 0, starts happening at around 25% opacity, and that the mean of the CDF remains fixed at the mean color intensity, denoted by the dashed lined, as the transform is applied.

It's worth mentioning that this process is most useful when you can actually physically see the original painting yourself in good lighting, otherwise you are only guessing what the original painting actually looks like.

Here are some before/after comparisons:

Overlay opacity: 17%, blur radius: ∞ (used method 1), source: author

Overlay opacity: 24%, blur radius: 300 pixels, source: Winslow Homer

Overlay opacity, 24%, blur radius: 67 pixels, source: Michaël Borremans

The difference is fairly subtle, but, in my experience/opinion, worthwhile, especially if the paintings are your own and you want the best pictures possible. Here is a couple video of my MATLAB program in action:







You can download my MATLAB code here.

Note: an original draft of this post was written in March 2016, and completed/published in August 2017.

Boston Triathlon 2017

On 7/30/2017, I did my first triathlon, The Boston Triathlon, sprint-length distance - 0.5 mile swim, 10 mile bike ride, 5.5 km run. It was a great time! Here is a photo of me from the bike portion:


I scraped the results database and made some graphs to see what I could learn about the athletes that completed this event. Here are some highlights:

My swim ranked 166 out of 574, or 29th percentile.

Here is a histogram with everyone's overall time. I am just about at the top of the bell curve.

Interesting, the correlation of age vs. overall time was weak - good news for aging triathletes.

Plotting individual event times against each other as a scatter plot showed much stronger correlation.

Here's the age breakdown of people that finished the race.

You can download my parsed results database here. The file extension is .mat, for MATLAB.

First experiment with Floyd-Steinberg image dithering


A foray into Floyd-Steinberg image dithering. I took the original picture from the Charles River Esplanade during the 8/21/2017 solar eclipse, then passed it through a dithering algorithm that I wrote in MATLAB. The only colors in this image are pure red, black, and white.

Sources:
http://www.tannerhelland.com/4660/dithering-eleven-algorithms-source-code/
https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering
http://www.visgraf.impa.br/Courses/ip00/proj/Dithering1/floyd_steinberg_dithering.html

Self-portrait sketch with a mechanical pencil

Self-portrait sketch with a mechanical pencil