Sunday, January 26, 2020

3D Focal Point Median Blur

Summary

This article presents a method for variably de-focusing a stereoscopic image in 3D space around a desired focal point. The goal is to better simulate a single moment of human vision, which is based on a focal point rather than a focal plane as with conventional photography.


Comparison of de-focusing methods — 1: original, 2: 3D planar focus, 3: 3D point focus

Human Vision vs. Camera Vision

Photographs are actually a fairly poor representation of how a person sees the world with their natural senses. This is often overlooked, however, because photographs have become so commonplace and integral that it's hard to even imagine an alternative. Similarly, the conventional 24 frames-per-second of movies looks "right", though not because it's actually realistic, but because we're used to it. I've previously discussed how chromatic (color) realism can be augmented in some photos using histogram diffusion. In this article, I'll turn my attention to focal realism.

In a single moment, a camera focuses on a focal plane, i.e. the set of all points a certain distance from the lens. In reality, the notion of a "focal plane" is a convenient approximation and "focal sphere" would be more accurate, but the distinction is not important here. This contrasts with human vision, which focuses only on a single focal point at a time. As a simple experiment, try fixing your eyes on a convenient focal point, and "looking" at other parts of the image away from the focal point, without moving your eyes. You'll notice that the image becomes progressively more "vague" as you look farther from the focal point. This effect happens simultaneously and seamlessly in all three dimensions. The greater the distance between focal point and observed point, whether up-down, left-right, or front-back, the more vague the image becomes. In effect, we have permanent tunnel vision, and compensate with rapid eye movements, piecing a scene together from multiple glances. Similarly, you can think of a photograph as a superimposition of an infinite number of "glanced" images.

Realizing this, I became interested in how to make photos more focally realistic.

Enter Stereoscopic Photography

At a high level, my idea is simple: for each pixel in a given image, blur it in proportion to its 3D distance from a given focal point. Since conventional photography produces an insufficient 2D result, I turned to stereoscopic photography which produces pseudo-3D images.

Stereoscopic photography is based on "stereo pairs", a set of two 2D images taken at the same time with a small offset between the lenses, in position and sometimes angle. This is essentially the same process at work in human eyes. This is usually achieved by synchronizing two identical cameras, using a purpose-built stereo camera, or less commonly by using mirrors to split a single lens into two sub-lenses. In the absence of special equipment, a crude method involves moving left-right slightly between taking two photos with a single ordinary camera. For the purpose of this article, I'll sidestep the hardware aspect and demonstrate with existing stereo pairs.

Depth Mapping

Stereoscopic depth mapping is the process of matching every pixel in the left image to its displaced position in the right image, and vice versa, to determine its distance from the lens. This provides a third spatial dimension to the image, conventionally called Z. While simple in theory, the practice is complex for several reasons:


  • Occlusions, or regions seen by one lens but not the other
  • Lens axial misalignment 
  • Differences in incident light between lenses
  • Inexactness of pixel matching

This technique also has subject limitations where pixels are not readily matched 1:1 between left/right images, such as surfaces that are featureless (clear blue sky), noisy (sand), reflective, or clear.

Rather than re-invent the wheel, I decided to use existing depth mapping software: Epipolar Rectification 9b (ER9b) and Depth Map Automatic Generator 5 (DMAG5) courtesy of developer Ugo Capeto. Both are implementations of academic papers, dated 2008 and 2012. I wrote wrappers to use both within MATLAB.

ER9b performs sparse, high-confidence pixel matching to transform both images so that all matches occur along horizontal lines, thereby eliminating angular misalignment between lenses. In other words, the images are rectified to simulate idealized coplanarity of the lens axes. This is a prerequisite for effective depth mapping, as lens axial coplanarity is a key assumption in efficient depth mapping algorithms.


Animation showing sparse high-confidence pixel matches; minimal rectification is necessary

DMAG5 takes as input a rectified stereo pair. It performs dense pixel matching, finds occlusions, and attempts to produce a physically realistic depth map for each image. Whereas ER9b requires no input parameters beyond the images themselves, DMAG5 requires roughly a dozen different parameters. These are fairly esoteric and in practice most can be left at their default values without affecting the result significantly. What is critical, however, is that the minimum and maximum disparity values (an output of ER9b) match the result of the rectification step, so that the depth map is correctly scaled. In practice, I've found that iteratively doubling/halving each parameter, keeping the helpful adjustments and reverting the unhelpful ones, is a decent if inelegant optimization method - a sort of poor man's gradient ascent method. This black-box approach is improved by making use of the built-in downsampling functionality to quickly evaluate parameters sets at low resolution before investing the computational time for the full resolution.

Depth map for left image

Dither Upsampling

At this stage, a shortcoming of stereoscopic depth mapping emerges: its low resolution. Typical disparity ranges are between 20 and 70 pixels, roughly analogous to 5- or 6-bit color, which compares unfavorably to standard 8-bit color. As a result, stereoscopic depth maps often show a "stairstep" effect where smooth transitions artificially show as jagged, similar in appearance to lines of constant elevation on topographical maps.

It would be easy to increase the disparity range in hardware, but only at the cost of increasing the size of occluded regions. In practice, this doesn't seem to be feasible. I found Floyd-Steinberg dither upsampling to be an effective strategy to counteract the low-resolution artifacts, albeit at the cost of adding noise. However, this step was slow and turned out to not be necessary for my purposes, so I discarded it.

Comparison of depth map before/after dither upsampling

Note that the dithered depth map appears noisier but smoother, and per the histograms more fully utilizes the available range of values in 8-bit color space.

Blurring

The last step is blurring the image. Several inputs are required here:


  • Focal point - where the eyes are focused
  • Depth-to-planar spatial ratio - how shallow or deep the image is
  • Minimum and maximum blur radius - how pronounced the blurring effect is

With these inputs defined, I measure the 3D distance from each pixel to the focal point, and map this linearly to the blur radius range. This results in a "radius map" which specifies the blur radius for each pixel. The stairsteps result from my simplifying assumption to only support whole-number blur radii, thereby obviating the need for sub-pixel calculations.


Blur radius map with eyes focused on foreground facial features

I initially defaulted to a Gaussian blur, but this didn't produce good results as the stairstep effect from the depth and radius maps persisted in the resulting image.

Instead, I used a median blur. Instead of setting each pixel's value as a normally-weighted average of its neighbors, the median blur sets each pixel's value as the unweighted median of its radial neighborhood. The median blur preserves edges, which is extremely beneficial in hiding the stairstep effect and smoothly transitioning between varying blur radii. Subjectively, the image also feels to be more similar to one seen out of the corner of one's eye, whereas a Gaussian blur feels more camera-like. I used a circular neighborhood as it produced the best results, though square and diamond neighborhoods also exist.

Unlike typical fixed-size median filters, mine requires the radius to vary as a function of its position; instead of one kernel, there are many. For efficiency, I pre-calculate all the kernels necessary before starting the blur calculation, and recall them as needed. This is about 3x faster than generating a kernel from scratch at each pixel.

Here are some sample result images, showing how the focal point can be moved to simulate looking at different parts of the scene. Note how the blur radius transitions seamlessly both within and between focal planes!






Demo Gallery

Shown here are several additional images created using the process outlined above, to further demonstrate the effect. These represent "good" results; some images work better than others for a variety of reasons.









Depth Map Gallery

Shown here are the depth maps corresponding to the images above.







Acknowledgements


  • Ugo Capeto develops and maintains excellent free depth-mapping software
  • 3D Shoot provided most of these stereoscopic images under a non-commercial license

Source Code

Freely available on my GitHub.

Sunday, November 24, 2019

Authorship of "A Warning"

2018's Anonymous, author of the infamous op-ed titled I Am Part of the Resistance Inside the Trump Administration, is back. Their latest publication is A Warning, a full-length book which builds on similar themes at much greater length. I previously attempted to deduce the author of the anonymous op-ed with forensic linguistics - see here and here. In this post, I'll revisit this analysis using the new information provided by A Warning. I'll also delve deeper into motive, means, timeline, and other signals. My interest in the authorship question is not political, but rather motivated by the challenge of trying to crack this topical, high-profile puzzle.

Text Similarity

The codebase I developed for Resistance estimates the authorship probabilities for a set of candidates by comparing attributable writing samples with target text of unknown origin. For each author, a matrix is developed representing Markov word transition probabilities, as well as a vector distribution representing their overall word choice. These are then compared against the same elements for target text. This code is freely available on my GitHubRe-running this code, substituting Warning for Resistance, produces this result.


This result is consistent with the same plot for Resistance. In the Top 10, McMahon and Mnuchin replace Pompeo and Kushner; otherwise the set is unchanged.


Subjects

I haven't read the book, but as a first approximation I extracted all its proper noun phrases and the number of their occurrences. I didn't find this initial result very informative but am including it for general interest.


Network Similarity

I started by filtering the subjects to only include administration officials, looking for an indication of the author's position within the administration by proxy of who they discuss, and how often.


The top officials mentioned cover a wide swath of the administration, including policy, operations, economics, defense, diplomacy, and more. Presumably, the author wrote about these officials roughly in proportion to the quantity of their real-life interactions.

To quantify this, I created a matrix gauging how likely certain officials are to have worked together. This enables a "network similarity" metric. To manage scope, I proceeded only with the top ten candidates from the text similarity metric. For completeness I added Bremberg, DeStefano, Stepien, and Ayers to the set, who lack published writing but have been recently suggested as possible authors.


Pence emerges as the clear frontrunner from this metric, with the caveat that he is also higher in the hierarchy, so more likely to be connected to any given senior official.

Academic Background

I then filtered by historical figures mentioned in the text, looking to probe the author's academic background.


There is a cluster of notable outliers: Cicero, Cleon, Aristotle, and Diodotus, all political/philosophical figures from Ancient Greece and Rome. These esoteric figures suggest a classical academic background, perhaps including political science, history, and/or law.

Motive

The book's purpose seems to be strengthening the case for impeachment and/or discouraging the 2020 re-election effort. Presumably, the author is to some extent self-interested, and would not advance these goals if they conflicted with their personal advancement. This suggests the author is not part of the president's inner circle and/or has already left office.

Warning, like Resistance, has a strong moralistic tone, which suggests it was not authored by an anti-environmentalist (Wheeler, Perry), investment banker (Mnuchin), entertainment executive (McMahon), or political director (Stepien).

Timeline

Sources disagree as to whether the author is still active within the administration. As far as I know, this isn't directly addressed in the book. However, its many dated accounts, specifically from 2015 through 2019, suggest a surprisingly clear timeline. Most of book's accounts are retellings of publicly available new stories; others more usefully contain private information which, if true, would only be known to an insider. As examples:

Public: "In a July 2018 interview, the president was asked to name America’s biggest global adversary."

Private: "Several departure timelines appeared to be converging in 2018, creating the possibility for a simultaneous walkout to prove our point about the president’s faltering administration."



Charting the fraction of private dated accounts by year suggests that the author is no longer with the administration, as the latest private account they provide is dated 2018. This suggests a departure date of late 2018 to early 2019. Moreover, they also provide a significant number of private accounts from 2016 which are pre-inauguration, suggesting that the author worked on the transition team and possibly the campaign team as well.

Children

The book opens with a dedication "[to] my children, and the forthcoming generation", implying that the author has at least two children. This suggests ruling out Hill and Bolton, who have one child each. The author says that they may eventually disclose their identity:

"Nor am I unprepared to attach my name to criticism...I may do so, in due course."

I concluded that an author so apparently concerned with principled righteousness would not misrepresent the number of their children, for the sake of their posterity and pride.

Signal Synthesis

To synthesize these various signals, I counted the quantity of affirmative signals from each candidate.


Topping the list is Ayers, Chief of Staff to the Vice President. Every signal points to his authorship, except text similarity which couldn't be evaluated due to a lack of publications. Notably, he's the only candidate to fit both timeline criteria. Circumstantially, the case is strong if incomplete.

One tier lower is Pence. He stands to lose the most if the 2020 re-election effort is defeated, yet benefit the most if impeachment succeeds. Warning may be a high-risk/high-reward push to capture the presidency at the risk of losing the vice presidency. This Machiavellian strategy is very difficult to square with Pence's reputation for being straightforward and loyal. Further, he is still in office, which is incompatible with the departure date implied by the private accounts. For these reasons, I consider Pence a false positive, and unlikely.

One tier lower still are Hill, Bolton, and Sullivan. The first two are weak candidates as their timelines don't match the timeline implied by the private accounts. Further, they both have only a single child. Sullivan has three children, which fits, but lower text similarity, ranked near the bottom of frontrunners. I consider these candidates to be unlikely.

Continuing down the list, the remaining candidates have progressively more discrepancies, making them less likely still.

Conclusion

Nick Ayers is the most likely candidate for the "Anonymous" behind both Resistance and Warning.

Monday, November 18, 2019

Histogram Diffusion Image Filter

Summary

An overview of an experimental image processing algorithm that uses 1D heat diffusion and exact histogram matching to augment the chromatic realism (i.e. the perceived color accuracy) of an RGB photo.

Background

A dream of mine since childhood has been a camera that can capture the exact image seen by one's eyes. No such device yet exists, but this dream explains my interest in painting that developed later in life. It's my belief that painting has persisted in the modern world because it retains an essential advantage over photography: it is the superior medium for accurately representing images seen by eye - in other words, the human visual experience.

Photography has its place too, but its images are cheap and inhuman. Although it invariably captures forms with perfect accuracy, it is clumsy and narrow with hue, saturation, and value, the essential ingredients for describing color. Compounding this is its unnatural way of focusing on a plane, instead of a point as eyes do. In other words, a photo is like a superimposition of an infinite set of eye-images, each focused on a different point on the same focal plane. Photos are great for quickly capturing appearances, but not how it feels to see. Here, painting at its best is unsurpassed.

Despite the shortcomings of photography, it remains a powerful and useful tool for artists. An interest of mine is how photography can be used as a basis for painting, and how various technical approaches can be used to mitigate its shortcomings. In this article, I'll describe an image filter that may be useful for augmenting the chromatic realism of a reference image.

Hypothesis

Over a series of walks through the local fall foliage recently, I was thinking about cognitive aspects of color perception. It stands to reason that the human eye is good at differentiating objects by subtle differences in color, as these were useful skills in improving the evolutionary fitness of our ancestors. The human eye has a limited bandwidth in terms of the wavelengths that it can perceive - roughly 350 to 850 nm. I reasoned that in general, an image would be maximally differentiable if these wavelengths were represented equally - in other words, if a histogram describing the frequencies of visible light was flat.

Histogram equalization achieves this ideal of a flat histogram and maximum differentiability, but at the cost of realism. The problem is that most scenes don't actually have a flat histogram, so enforcing one causes unrealistic and harsh shifts in color.


Examples of "naive" histogram equalization applied independently to images' R, G, B channels

I concluded that a flat histogram has "too much" differentiability, while a raw histogram has "too little", with neither feeling quite realistic. I wondered if some intermediate histogram would offer an ideal balance between the two extremes, and realized that 1D diffusion, as if by the heat equation, offered a solution for continuously interpolating between these extremes. This is because the steady-state solution for 1D diffusion is a flat line, while its initial condition may be any arbitrary vector of values. The hypothesis then emerged:

Diffusing an image's histogram, as if by the heat equation, can be used to augment its chromatic realism.

Implementation

The 1D heat equation states that the derivative of temperature with respect to time is proportional to its second derivative with respect to space, i.e. its curvature:


For an initial condition of T, this can be solved easily using the finite-difference method. Applying this to images requires replacing temperature, T, with "level count", i.e. how often each value occurs within each channel. This step has no real dimensional basis, but is convenient and easy. Working in RGB color space, each channel can be operated on independently as if it was a grayscale image. T
hese arguments can also be generalized to alternate color spaces such as HSV.

Less trivial yet critically important is how boundary conditions are handled. The boundaries of an image's histogram represent its saturation points. It's best to avoid saturating an image when taking or editing a photo, as it causes a loss of information and an overall loss of realism. In the most severe cases, when all three channels are saturated, the resulting color is pure white or black, and no visual information is encoded. This commonly occurs as a result of over- or under-exposing an image. However, moderate saturation may be acceptable in some cases. I define "moderate saturation" as a channel's boundary values being 10% or less of its peak value.

To permit diffusion while discouraging saturation, I enforce these constraints:
  • If a boundary is ≤10% saturated, fix its value over time, i.e. set it as a Dirichlet boundary.
  • If a boundary is >10% saturated, insulate it so that it tends to drift downward over time, i.e. set it as a Neumann boundary.
These constraints ensure, for typical images, that no additional saturation occurs as a result of the diffusion. The steady-state solutions of this equation with these boundary conditions are always straight lines, practically but not exactly flat, very much like canonical histogram equalization. In practice, best results are achieved well before reaching diffusive steady-state.

To transform the image according to the diffused histograms, I used an existing exact histogram match algorithm. This introduces a small amount of imperceptible noise to transition from a discrete-valued image to a continuously-valued image, which enables every pixel to be sorted in strictly ascending/descending order. The image can then be transformed so that it exactly produces the input histogram.

Results

All photos in this section were taken with my 2003 Canon Rebel. No modifications were made prior to diffusing the histograms. The diffusion runtimes are equal between channels, but not between images - this was qualitatively fine-tuned by eye.


This is a simple example to start with. The original's colors are very tightly clustered around a central value, so the histograms resemble bell curves and diffusion essentially amounts to a conventional horizontal scaling of the histogram.


In R (Red Channel) and G (Green Channel), fixing the left-hand boundary pushes the histogram to the right as it diffuses, brightening those channels and finding definition in the nearly-saturated shadows. In B (Blue Channel), the insulated left-hand boundary is allowed to drift downward, and the exact histogram matching algorithm injects noise, guessing at the actual values of these saturated pixels. This guesswork is well-hidden in the result, because B contributes almost nothing to a color's perceived value, i.e. its perceived brightness/darkness. This is due to a quirk of human vision which places disproportionate emphasis on the Green Channel.


Like with the desert mountain photo, but without the haze, diffusion essentially scales the histogram horizontally, accentuating the greener spots, and highlighting some contrasting patches of pale-copper/salmon soil. Note that diffusion enables augmentation of the Blue Channel where histogram scaling is infeasible due to near-saturation.



This image is more bi-modal in nature, due to the clustering around both sky and and ground. Because the original image covers the available gamut relatively fully, conventional histogram scaling cannot be used. However, diffusion encourages fuller use of the mid-range gamut between these two clusters, reducing the warm glow and the increasing the overall contrast, especially in the ground cover and clouds. A curious and persistent bug occurs here - sparse regions of tree foliage set against the sky turn whitish, as if snow-covered.



Another example of an initially well-utilized color gamut that can be augmented by diffusion.


An especially good result wherein R and G are both fixed, while B is insulated on the low end. The fog is thinned and the underlying colors are better illuminated.




Another bimodal gamut where scaling is inappropriate due to near-saturation on the low end initially. Diffusion preserves the means of each cluster, while diffusing them individually to find subtle hues in the sky and better distinguish rock from greenery.

Conclusions

Histogram diffusion is a potentially useful image processing filter for augmenting chromatic realism. It especially provides an advantage over conventional histogram scaling for images that are one or more of:

  • Bi- or multi-modal
  • Nearly-saturated initially, making histogram scaling infeasible
  • Strongly dissimilar between color channels, requiring per-channel treatment

Initial experiments applying this same methodology to HSV color space did not seem to produce significantly different/better results, but more exploration could be done in this and other color spaces.


Source Code

Freely available on my GitHub in MATLAB.

Sunday, July 21, 2019

How Paints Mix

Summary

For augmented realism in artistic painting, this article summarizes mathematical methods to characterize any palette of primary paints and decompose any digital color into an analog paint mix recipe, using Kubelka-Munk theory, Schmid color charts, and visible-light spectroscopy.

Motivation

Ever eloquent, Lucian Freud best expressed the importance of color realism in painting:


"...[When] colors work, I think of them as the color of life, rather than colored."

Famously dogged and penetrating in style, Freud labored over ever
y stroke, working and reworking his canvases over long sessions even into old age. His eye was a perceptive color comparator, restless until the canvas felt as true as the scene. Similarly, Rembrandt wrote that his paintings sought to achieve "the greatest and most natural movement".

Venerated by Freud, Rembrandt, and all great painters, realism is the strongest unifying principle across painting's long and diverse history.

Rembrandt, 1628

Freud, 1985

When I started painting, I was confounded by the skill of color matching, the the act of mixing primary paints together to create colors that match those observed by the eye. It requires both seeing colors accurately and knowing how to mix them from primary paints. I've seen painters who can color match with astonishing ease and accuracy, but I can only stand apart and admire their skill. I still find it confounding, though with experience I've developed a decent working knowledge.

Godfrey Reggio said this of his 1982 film Koyaanisqatsi, one of my favorites:



"...[Our] language is in a state of vast humiliation. It no longer describes the world in which we live."

Try to describe
 the spectrum of real-world colors in words and you'll quickly see that Reggio's axiom neatly encapsulates what makes color, with its intangible, experiential, and subtle nature, so hard to describe. By extension, this also makes color matching very difficult. But, as the great masterpieces in oil painting's rich history show, it's not out of reach - it's just very hard.

Frustrated with my lack of talent at color matching, and with the clumsy old-fashioned models for color mixing, I began to wonder if 
there was a cleaner, mathematical way of describing paint and color. I reasoned that since every painting can be represented as a digital picture, and since every constituent pixel has a well-defined digital color, any painting can be represented mathematically. Put another way, every digital picture can be represented as a painting.

My focus then became developing the critical link between physical color (i.e. paint) and digital color (i.e. pictures). Less abstractly, I sought to develop an algorithm that could decompose any digital color into an analog mixing recipe. This algorithm, and path to developing it, is the subject of this article.

Early Missteps in Digital Color

Having some prior experience in digital image processing, I initially expected that the problem could be solved purely in a digital color space - simply quantify the colors of the primary paints, then mixtures thereof should be simple weighted averages. However, this approach doesn't work, as a quick counterexample shows:
  1. Blue and yellow mix to make green.
  2. Blue and yellow may be represented in RGB as [0 0 255] and [255 255 0].
  3. The average of [0 0 255] and [255 255 0] is [128 128 128].
  4. [128 128 128] is gray.
  5. Gray is not green.
Converting to alternate color spaces before averaging provides similarly erroneous results: Lab predicts a pale pink, and XYZ predicts periwinkle. I also tried dithering swatches of the target color using primary paints as the set of permitted colors, as if to simulate pigment particles with pixels. Initially encouraged by the fact that this method produced different results than a naive average that were also plausible and visually interesting, I eventually conceded that it didn't work after some simple experiments disproved its accuracy.

Interesting but unsuccessful experiment at color decomposition using dithering

These methods all fail due to their neglect of the microscopic optics at play in actual paint films, which cannot be captured by reductive digital imaging due to the irreversibility of the color perception causal chain.

Enter Kubelka-Munk

After some more research, I learned that this is actually a semi-solved problem, with the theoretical foundation having been laid in 1931 by German scientists/engineers Kubelka and Munk. Later researchers extended their framework by developing practical methods of using it, especially starting in the 1980s with the advent of personal computers. Kubelka-Munk theory (K-M) remains important to many industries, especially the manufacture of coatings, films, and paints. It applies to both translucent and opaque media, although as an alla prima oil painter, I'm only interested in the simpler opaque case.

K-M posits that diffuse light incident on a paint surface is either reflected (R) off the surface, absorbed (K) by the paint pigment itself, or scattered (S) through the surface to other pigment particles. Further, it posits that R, K, and S are all functions of light wavelength.


Diagram of reflectance, absorption, and scattering in a pigment-based paint film

K-M provides the following equation relating R, K, and S for an opaque film:



which may be rearranged to solve for K/S:


K-M also holds that paint mixtures can be described as weighted averages of the constituent K and S values:



where the subscript i denotes the paint index, and c denotes the dry pigment mass fraction in the mixture. This turns out to be the extent of math needed from K-M. More math is needed, though, to actually make use of these essential equations.

Light, Reflectance, and Color

Intuitively, paint has no observable color unless illuminated, and further, its perceived color changes depending on its illuminant - whether bright/dark, warm/cool, etc. Therefore, to begin modeling color, an illuminant is needed. Conveniently, several standard illuminants were defined starting in 1931 by the CIE (International Commission on Illumination), coincidentally the same year as K-M's publication. Due to its ubiquity, I chose CIE D65, representing "average [outdoor] midday light in Western/Northern Europe". Outdoor lighting is broad-spectrum, bright, and relatively temperature-neutral, often considered optimal for lighting and photographing paintings.



Next, paint reflectance spectra are needed. This is the same quantity as the R in K-M, and it is similarly defined as the normalized diffuse reflection as a function of wavelength.


Reflectance spectra for red, white, and a red/white mixture from literature

The interplay between illuminant and reflectance creates color. However, unlike illuminants and reflectance, color doesn't actually exist in an absolute, quantifiable sense. For example, the fact that 475 nm light looks "blue" or that 700 nm light looks "red" is a perceptual abstraction of the human brain, not an universal truth. Further, perception of color varies between individuals and species, and is best understood as an inherently personal characteristic. Nevertheless, CIE further provides a "standard observer" function which quantifies the perceptual sensitivity of the average human eye to light, again as a function of wavelength.



The CIE standard observer is comprised of three independent functions corresponding to the three types of cone cells in the human eye, which specialize in short-, medium-, and long-wavelength light. It's worth noting that human vision is significantly non-uniform. We are probably most sensitive to regions of color that were most evolutionarily advantageous to our ancient ancestors. Regardless, this is an important point for later discussion on color matching.

With these three pieces in place - illuminant, reflectance, and observer - color may be calculated essentially as their dot product. This provides color in the XYZ, or tristimulus format, representing the overall signal delivered to the eye. XYZ can then be converted to RGB for display on a digital screen, or to Lab for calculating color difference. For best results, I calculate all color differences in Lab space, which by design is perceptually uniform per the observer functions, unlike RGB space.

In this framework, color may be thought of as the checksum of the underlying spectral data, in that it provides an simplified summary of the underlying spectral data. This framework also shows how perception of color is inherently reductive, i.e. an infinite number of reflectance curves may produce the same perceived color, or put another way, a reflectance curve cannot be derived from a perceived color.

Characterizing Paints

The last and most difficult part of the color mixing problem is characterizing paints - specifically, the determination of their K and S spectra. A few aspects make this step challenging:
  1. K and S cannot be measured directly, only calculated indirectly through mix tests.
  2. K and S vary significantly with wavelength and are not well-approximated as constant.
  3. K and S cannot be determined without a spectrophotometer.
  4. K and S are easily overfitted if the number of mix tests is too small.
Each of these aspects was hard-learned through various failed attempts to devise new, simpler methods for cracking this sub-problem. As far as I can tell, there is no easy, elegant method for this step, despite my best efforts. It simply requires a lot of mixing and number crunching.

One approach (Haase, Meyer) is to assume an ideal white, i.e. K=0 and S=1, then tint (i.e. mix with white) one's primaries and measure the reflectance. Algebra can then be used to calculate K and S for the pure primary. I tried this method for 5 and 10% tints (i.e. 95 and 90% white), and it provided excellent results for primary/white mixes, but did not generalize well for primary/primary mixes. I concluded that K and S had been overfitted.

A much better approach (Walowit, McCarthy, and Berns) is to build up a large database of color mixes, including both recipes and corresponding reflectance spectra, then use the linear least-squares (LLS) method to find error-minimizing K and S values at each wavelength. The authors begin with K-M's proportional weighting of K and S in mixes:



which can be cross-multiplied and expressed as:



This can then be expressed in terms of the LLS formulation of:



which can be neatly solved for B, containing K and S values for every primary:



This method is very flexible and can accommodate any number of primaries, mixes, and recipe ingredients. A full derivation may be found in the Appendix.

However, the authors do not offer guidance on which recipes to mix to sample uniformly over a palette's color gamut. For this, I looked back to the art world and discovered Schmid color charts, which use a methodical approach to mix virtually all colors possible for an arbitrary set of primaries. Every primary is mixed with every other primary, in both a dominant and non-dominant capacity, at varying levels of tinting (whiteness/brightness). Schmid's convention is that every primary/primary pair be mixed at five levels of tinting. However, since I also needed to measure the exact mass breakdown for each recipe (while Schmid doesn't), I realized that I would need to reduce resolution to make the scope manageable. I eventually settled on three tinting levels per primary/primary pair.

I then carefully set my standard palette using my prior painting experience. These primaries cannot produce any arbitrary color, but they are naturalistic and cover the full gamut for the images that I paint. I chose my most essential and favorite paints, all from Winsor & Newton's "Winton" line:
  1. titanium white
  2. cadmium yellow
  3. cadmium red
  4. burnt sienna
  5. burnt umber
  6. sap green
  7. ultramarine blue
I set the dominant primary as 75%, and the non-dominant primary at 25%. I set my tinting levels at 0, 45, and 90% white. I set a mass target for each mixed paint volume at 3.00 g, with each ingredient subject to a +/- 0.02 g tolerance. This provided more than enough paint to thickly cover a 2 x 2" canvas board swatch without too much waste.

With this, it was simply a matter of sitting down and working through all the mixes required - 109 in total. Each sample took about 5 minutes to create.

Mixing 45% titanium white, 41% ultramarine blue, 14% cadmium red with a palette knife


Fully-mixed purple paint

With the amount of effort this ended up taking, about a full weekend, I decided that I ought to frame the results.

Schmid color chart for my selected primary paints


For measuring the reflectance of each swatch, I pursued a few different paths until ultimately finding and purchasing the Spectro 1, a fantastic new spectrophotometer aimed between the consumer and industrial levels. It overall offered the exact functionality I needed at a great price, and this project would not have been possible without it.


Spectro 1, a fantastic new spectrophotometer that is about the size of a pill bottle

With the swatches mixed, dried, and scanned, I was now able to use the LLS method, combined with my measured recipes, to calculate K and S spectra for all my paints. Implemented as described in the original paper, this method provided disappointing results when comparing the measured and simulated Schmid chart, both visually and numerically. To compare colors, I used CIE76, which is essentially Euclidean distance in Lab space, normalized to a "just noticeable difference" of 2.3 ΔE.


Disappointing results for a naive implementation of the LLS method; left=measured, right=simulated

A problem with this approach is that its results are not constrained by K-M, and consequently its fitted K and S values violate K-M and cause unintended color shifts. To resolve this, I applied an additional least-squares optimization:

Given K and S, find K* and S* such that:


  • error = c_1(K-K*)^2 + c_2(S-S*)^2 is minimized, and
  • K*/S* = (1-R)^2 / (2R), i.e. K-M is enforced

where c_1 and c_2 are fractional weights. This is solved numerically by specifying a range and resolution for S*, then deriving K* for each and calculating the corresponding error. This is performed at each wavelength for each paint. To optimize the values of c_1 and c_2, I defined bounds for c_1 as 0 to 1, and c_2 = 1-c_1. For each pair of weights c_1 and c_2, I optimized K* and S* using the above method, then calculated the average color prediction error against the Schmid chart.


Optimization result for full K* weight domain, 0-100%


Optimization result for K* weight domain near minimum, 2-4%

Repeating this for an array of c_1 values, it becomes clear that the optimal solution is to heavily weight S (in this case, c_1 = 0.027), nearly to the point of discarding K and defining it per K-M as a function of R and S. Incidentally, this method is much simpler and nearly as good, with an average color error that is only about 4% higher for my dataset.

With K*, S*, and c_1 optimized, the color predictions become very accurate - on average, 1.535 JND for the full dataset. In other words, the average color difference is just noticeable, and if you squint (a classic artist's trick) at the comparison chart, the seams between the colors practically disappear.


Final result with LLS and optimized enforcement of K-M; left=measured, right=simulated

Cross-Validation

Aware of the risk of overfitting K and S with too few samples, I cross-validated using the holdout method. To do this, I randomly and repeatedly partitioned the full dataset into "fit" and "test" subsets, calculated K and S using only the "fit" subset, then tested the resulting model against the "test" subset. For expedience, I kept c_1 as 0.027, which is sub-optimal and conservative from the perspective of cross-validation.


Holdout cross-validation showing stability of average color error with decreasing data

The relative flatness of the trial-average line, even up to 50% withheld, confirms that these methods and spectra are well-generalized and not overfitted. It also suggests that the number of swatches mixed could have been reduced with minimal penalty to the model's accuracy. A useful metric here is mixes per primary. One source (Baxter, Wendt, Lin) shows 71 mixes per 11 primaries, or 6.5 mixes/primary, with relatively poor color matching. My 109 mixes per 7 primaries, or 15.6 mixes/primary, has better results, though is much more time-consuming. More mix data, provided it is well-distributed over the color gamut, will only improve results, although with diminishing returns as K and S converge.

Decomposing Colors

Once K and S are known with confidence, decomposing colors into recipes is relatively trivial. A brute-force optimization function can simply simulate thousands of random recipes and select that which minimizes error to the target. The result can then be stated in the convenient and familiar format of "parts" for ease of use on a palette. To make the results useful, a few constraints are needed. These are baseline values I consider reasonable from painting experience:


  • Maximum number of ingredients: 3-4
  • Minimum fraction per ingredient: 5%
  • Fraction resolution: 5%

Before decomposing digital colors, it's necessary to first scale the image's value (i.e. lightness/darkness) range to match that of the palette. This ensures that all colors are producible by enforcing that the darkest image color corresponds to the darkest color in the Schmid chart, and so on for the lightest. This is analogous to a "Levels" adjustment in Photoshop or GIMP. This is necessary due to Illuminant D65's intense outdoor brightness which un-saturates even the darkest paint mixes that appear nearly black when indoors.



Value scaling comparison; left: original/indoor, right: re-scaled/outdoor


Demo: Decomposing Existing Paintings

Rembrandt's pigment choices and working methods are subjects of academic study in and of themselves, but for the sake of illustration let's suppose that Rembrandt used my well-characterized palette of seven primaries. The color of any pixel in the image may then be decomposed into an error-minimized recipe for mixing. An ingredient quantity limit of three is imposed. Representative samples are shown below. 


Color decomposition demo on a Rembrandt self-portrait

This painting is fairly monochromatic, and accordingly its recipes are dominated by burnt umber, titanium white, and cadmium yellow. Ultramarine blue, cadmium red, and sap green make only minor appearances. Burnt sienna is not used at all, suggesting that it is not needed for this particular painting.

Color decomposition demo on Monet's Impression, Sunrise

Here's another demo on Monet's Impression, Sunrise, which crystallized the Impressionism movement. I've increased the ingredient quantity limit from three to four, and so the recipes become more complex and accurate. Monet uses titanium white heavily to create the pastel hues associated with aerial perspective and marine fog. Every paint is represented, although cadmium yellow is nearly absent. Cadmium red is reserved for the intense morning sun, and burnt sienna is de-emphasized in favor of the cooler burnt umber. The darkest colors are still about 20% white, so Monet is remarkably restrained in his use of darks.

Demoing on photos of existing paintings provides an intuitive and instructive result, but this method works equally well in reverse with ordinary photographs. It could also be used to determine the best primaries for a given image, or estimate the primary paint volumes required to minimize waste.

Demo: Creating New Paintings


Working in reverse, any digital image may be translated into an oil painting. In these examples, all mixi
ng recipes were generated using brute-force color decomposition against a reference image. Unlike the Schmid color mixes which were precisely weighed, these recipe ingredients were measured by eye in the "parts" format, since weighing so many mixes would be impractical and un-painterly. Also, the error introduced from measuring recipes by eye opens the door to a degree of randomness and spontaneity that all great paintings possess.

Initial surveys of the reference images indicated that certain primaries were not necessary, so they were omitted from the palette. Since color decomposition is is done by random brute-force iteration, eliminating unnecessary primaries improves both the repeatability and quality of solutions. It also reduces paint waste and mixing complexity. This corroborates the traditional wisdom of the "limited palette" often advocated by art educators.


Still life; sap green and cadmium red omitted

Self-portrait; burnt sienna and sap green omitted

Conclusions

This article summarized some color science that is relevant to paintings, and methods for characterizing paint color mixing properties and decomposing colors into mixing recipes. All of this is carried out to the highest level of accuracy possible with a hobbyist's means. The resulting framework provides a new tool for creating accurate, engaging paintings, or decomposing existing paintings to better understand the artist's methods. Additional applications may include art restoration, forgery detection, and art education. My primary interest is to use this framework to develop color mixing intuition and support the creation of new paintings.

Acknowledgements

  • Orion Taylor suggested the K* and S* optimization which improved color prediction accuracy by 4%.
  • Drew Hmiel advised on spectroscopy aspects.
  • Bunny Harvey sparked my interest in painting and instilled the importance of color matching.
  • Serena Tolar clarified the terminology for color-measuring instruments.

Appendix


Derivation of LLS method for finding K and S values, per Walowit, McCarthy, and Berns

Sources

"Modeling Pigmented Materials for Realistic Image Synthesis", Chet S. Haase and Gary W. Meyer, 1992.
Provides a good introduction to K-M, although their method for finding K and S causes overfitting.

"Useful Color Equations", Bruce Lindbloom.
Provides all necessary equations for converting between XYZ, Lab, and RGB color spaces.

Colour & Vision Research Laboratory, Institute of Opthamology.
Provides observer and illuminant functions.

"IMPaSTo: A Realistic, Interactive Model for Paint", Baxter, Wendt, Lin, 2004.
Summarizes the linear least-squares method for finding K and S spectra.

"A Review of RGB Color Spaces", Danny Pascale, BabelColor, 2003.
Provides some quantities for working with RGB color spaces.

"An Algorithm for the Optimization of Kubelka-Munk Absorption and Scattering Coefficients", Walowit, McCarthy, and Berns, 1987.
Provides a detailed walkthrough of the LLS method as applied to K-M.