A few months ago, I listened to a discussion between philosopher Sam Harris and technologist Tristan Harris that I found extremely interesting and persuasive. Tristan Harris has emerged as an outspoken critic of the "attention economy" that underpins many digital services which are offered for free with ad support, most notably Facebook. In essence, he argues that the relationship between human attention and profit that underpins free digital services creates a unhealthy incentive structure that rewards high-impact, shocking, emotional content and attenuates slow-burning, moderate, thoughtful content. My own sense of this disturbing trend led me to leave Facebook shortly after the 2016 presidential election, during which I became frustrated by the prevalence of low-quality, sensationalist content. Tristan Harris is also critical of how modern apps, in an effort to command and maintain attention, use bright, high-contrast, saturated colors that draw upon biases in human perception to "hijack" our attention. One counter-measure he proposes is setting one's phone to grayscale.
I've had my phone in grayscale for a few months now and strongly support the idea. It has helped me spend less time on my phone, and more time on hobbies I find more valuable. The grayscale makes my phone less visually interesting, and it also reminds me subconsciously that if I'm looking at my phone out of boredom, there are better things I can be doing with my time.
Idea
On a particularly unpleasant bus ride recently, I was thinking about digital grayscale image conversion. Any given grayscale value, 0-255, maps to a large number of colors. For example, it's possible to find a particular shade of red, green, blue, yellow, orange, etc. that maps to the same shade of gray in a grayscale conversion process.
This got me thinking - what would a color image look like if it was constrained such that after grayscale conversion, it flattened to a single shade of gray? In other words, suppose you wanted to "encrypt" an image in color such that an agent viewing the image in grayscale would see no image at all, but rather a single shade of gray?
Grayscale Conversion
There is no single way of performing digital grayscale conversion, but rather a variety of techniques. If each pixel's color is represented as a 1-by-3 array, with the elements corresponding to its red, green, and blue components (R, G, B), then a few common grayscale conversion methods are:
• Lightness: average the most and least prominent colors
◦ value = (max(R,G,B) + min(R,G,B)) / 2
• Average: average the colors
◦ value = sum(R,G,B) / 3, or equivalently...
◦ value = 0.33*R + 0.33*G + 0.33*B
• Luminosity: weighted average of the colors
◦ value = 0.21*R + 0.72*G + 0.07*B
The average and luminosity methods are essentially identical, with the only difference being the weights given to the color channels. I decided to focus on these two methods for being better-suited for the optimization approach discussed below.
Geometric Approach
For an image to satisfy my encryption criteria and flatten to a single color in grayscale, each pixel in the image must satisfy the constraint equation:
K = 0.21*R + 0.72*G + 0.07*B
where K, somewhat arbitrarily, is the average value of all color channels, and [0.21 0.72 0.07] represents the weightings for each color channel. After some thought, I realized that this problem could be approached geometrically by considering the color channel variables R, G, B as displacements in 3D space, i.e. x, y, z. Viewed in this way, the constraint equation actually describes a tilted plane in 3D space. The plane's specifications depend on the K-value from the input image and the weights from the grayscale conversion method. Here are some representative visualizations:
With the color channels equally weighted:
Favoring the red color channel:
Favoring the green color channel:
Transforming a pixel can be thought of as projecting it onto the constraint plane. This is equivalent to picking the closest point on the constraint plane, but much faster. The projection is done by calculating the input pixel's distance to the constraint plane, then subtracting this distance along the constraint plane's unit normal vector. It's worth noting that after projection, some pixels may lie outside the bounding box of [0 1] in x, y, z. I haven't thought of a good way of efficiently snapping these points to their best location on the constraint plane. However, for most ordinary photographic images I've analyzed, the K-value is around 0.5 and all pixels can be projected without error.
Results
Shown below are some results with varying color channel weights. I tested several standard weightings and some non-standard weightings as well. None of these weightings exactly flattened the images in grayscale on my Android phone, but some came pretty close - [0.21 0.72 0.07] seemed to provide the best result.
Standard weightings:
Non-standard weightings:
This project yielded some useful knowledge, primarily that visualizing color intensity as displacement in 3D space provides geometric intuition and optimization improvements. However, the idea doesn't have any apparent uses other than as an interesting toy problem for image processing, nor does it seems feasible since "encrypting" the image would require prior knowledge of the grayscale conversion algorithm being used, of which there are many.
Source Code
Available on my GitHub.
Source Code
Available on my GitHub.
"Transforming a pixel can be thought of as projecting it onto the constraint plane."
ReplyDeleteThis isn't the only way to map a full-color input to a constant-luminosity output, and probably isn't the one that maintains the most data from the original picture. Since luminosity is very close to being the most important thing about the image you could consider mapping luminosity to hue? Which still leaves you with another dimension to work with.
That's an interesting idea! It would probably be more readable. Now that you have me thinking of alternate mappings, you could also transform the constant-luminosity constraint from a plane in RGB to a surface in CIE LAB, then again project pixels onto it. That would give optimal color matches although the geometry calculations would be a lot slower and less trivial.
Delete