Wednesday, March 20, 2019

March Madness 2019

Last year I went into significant depth in developing a method for optimizing March Madness brackets. I'm now focused on other projects, but since the code is easily reused, I decided to run it again this year. Aside from some under-the-hood improvements, I used the same method, incorporating one tweak from my baseball prediction model: accounting for home field advantage. This lumps together the effect of players, referees, and fans collectively tilting the odds slightly in favor of the home team. In the 2018-2019 NCAA season, the win rate for home teams was 51.22%, so the home field confers a 1.22% win probability advantage to the home team. While minor, this does slightly affect bracket outcomes this year as deep as the Final Four. This also demonstrates how famously unpredictable this tournament is, as a small tweak can ripple outward with significant effects. For the sake of expedience, I skipped bracket polling and used the "centrist" parameter set from last year.

Here's a view of the regular season:

And a view of the postseason:

Like last year, I built my bracket in reverse, selecting the winner of each game as the team that has the highest probability of eventually winning the championship:

Round of 32       Round of 16       Round of 8      ...of 4   ...of 2  ...of 1
Duke              Duke                                                       
VCU                                 Duke                                     
Liberty           Liberty                                                    
Virginia Tech                                       Duke                     
Belmont           LSU                                                        
LSU                                 Michigan State                           
Minnesota         Michigan State                                             
Michigan State                                                Duke           
Gonzaga           Gonzaga                                                    
Syracuse                            Gonzaga                                  
Murray State      Florida State                                              
Florida State                                       Gonzaga                  
Buffalo           Buffalo                                                    
Texas Tech                          Buffalo                                  
Nevada            Michigan                                                   
Michigan                                                               Duke  
Virginia          Virginia                                                   
Oklahoma                            Virginia                                 
Wisconsin         UC Irvine                                                  
UC Irvine                                           Virginia                 
Villanova         Villanova                                                  
Old Dominion                        Tennessee                                
Cincinnati        Tennessee                                                  
Tennessee                                                     Houston        
UNC               UNC                                                        
Utah State                          UNC                                      
New Mexico State  New Mexico State                                           
Kansas                                              Houston                  
Iowa State        Houston                                                    
Houston                             Houston                                  
Wofford           Kentucky                                                   

Source code is stored on my GitHub.

Thursday, December 6, 2018

NYT "Resistance" Op-Ed: Part 2


Last month, I waded into forensic linguistics, writing about using natural language processing along with other contextual clues to identify the most likely author of the New York Times' controversial "I Am Part of the Resistance Inside the Trump Administration", published anonymously by a "senior administration official". My conclusion was that the most likely author was Fiona Hill, a senior-level but low-profile policy adviser on Russia and President Putin. In this post, I set aside the "Resistance" op-ed and attempt to quantify more generally how accurate my model may be, and what the implications are for my prediction.

Exploring Validation Data

Like a game of Guess Who, the analysis began by gathering a shortlist of officials considered to be frontrunners by media and betting venues - around 40 in total. For each candidate, I manually collected about 5,000 words of their writing from op-eds, essays, speeches, etc., attempting to maximize similarity to the subject, venue, and tone of the anonymous op-ed, though some compromises had to be made, as the various candidates speak and write for widely varying audiences and occasions. Each writing sample was around 1,000 words in length, and the number of samples per candidate was about five.

My validation process is straightforward: from my original text data, withhold one writing sample (from a known author), train the model on the remaining data, and rank the candidates in terms of linguistic similarity to the withheld writing sample. Then, repeat this process for all remaining writing samples, around 200 in total, and assess. The initial result is underwhelming:

A few authors (Hassett, Hill, McMahon, etc.) were correctly identified most of the time, but most were never correctly identified. However, from my limited understanding, forensic linguistics is more probabilistic than exact, and with such a large field of candidates, determining exact authorship of a relatively short writing sample is a long shot. To dig deeper, I set aside prediction accuracy and looked instead at relative rankings:

This is a bit more encouraging. ~85% of candidates rank in the top half for their respective writing samples, on average. At the bottom of the chart are linguistic chimeras, whose writing samples apparently are so disparate that the author sounds like multiple different voices. The more likely cause though is inconsistent text data with significant stylistic differences between samples, e.g. an op-ed vs. a prepared speech. Another issue exists: the same "usual suspects" frequently appear as top candidates across multiple different authors. Presumably this is due to their linguistic style being highly generic and often "mistaken" for others'. To get a sense of this, I plotted overall ranking averages for all writing samples, including those by the author and those not:

Another way to evaluate this method is by asking how it compares to random guesswork. Plotting this same data as a PDF and a CDF:

Clearly, this method is substantially better than guesswork, though imperfect. The CDF also illustrates a very useful trade: pick the top candidate, and you're 15% likely to be correct. Top two picks is 18%, five is 40%, 10 is 65%, 15 is 75%, and so on. Looking at it from the opposite perspective, we can be 75% sure that the bottom (37-15) = 22 candidates are not the author, and ruling people out may be nearly as useful.

Implications for the NYT Op-Ed

On Fiona Hill

Hill is shown here to be one of the most frequent false-positive candidates, i.e. the algorithm often erroneously attributes other authors' text to her. On the other hand, text by Hill is correctly attributed 85% of the time. Per the PDF, I can now ascribe a confidence level of 15% to my prior Hill prediction. This is low in absolute terms but high relative to a field of 37 candidates where a null probability is 2.7%.

Widening the Net

As explained under the CDF, allowing a larger field of authors increases prediction certainty. Let's re-examine a graph of overall linguistic similarity from my previous post:

From the PDF, we see expectedly that the candidate ranked first is most likely to be correct. Continuing on, the next-most likely candidate is counter-intuitively not the candidate ranked second, but ranked fourth, and so on down the list. So using similarity together with the PDF, if we pick a confidence level, e.g. 60%, we can pick the top N candidates such that we are 60% confident that the author is contained in the set, per the CDF. For a simple preponderance of the evidence, i.e. >50% certainty, the field required is 6 candidates. So the author is probably one of:
  1. Fiona Hill
  2. Andrew Wheeler
  3. Mike Pence
  4. Rick Perry
  5. Nikki Haley
  6. Jared Kushner

Conversely, the author is probably not any of the remaining candidates. But language can be deceptive, as this is still essentially a coin toss scenario with low certainty. I'd rather be at least "pretty sure". So I'll increase the confidence interval to 80%, and from the CDF see that a field of 16 candidates is required. So, we can be pretty sure that the author is one of:
  1. Fiona Hill
  2. Andrew Wheeler
  3. Mike Pence
  4. Rick Perry
  5. Nikki Haley
  6. Jared Kushner
  7. Ryan Zinke
  8. John Bolton
  9. Mike Pompeo
  10. Kevin Hassett
  11. John Sullivan
  12. Linda McMahon
  13. Steven Mnuchin
  14. Betsy DeVos
  15. Jim Mattis
  16. John Kelly
And, we can be pretty sure that the author is not:
  • Mick Mulvaney
  • Joseph Simons
  • Alex Azar
  • Robert Wilkie
  • Alexander Acosta
  • Ben Carson
  • Raj Shah
  • Larry Kudlow
  • Ivanka Trump
  • Gina Haspel
  • Ajit Pai
  • Jeff Sessions
  • Dan Coats
  • Elaine Chao
  • Kellyanne Conway
  • Jon Huntsman
  • Wilbur Ross
  • Melania Trump
  • Kirstjen Nielsen
  • Robert Lighthizer
  • Sonny Perdue
Further Speculation

Setting aside linguistic similarity and and looking at the "pretty sure" shortlist, suppose we eliminate all officials not involved in foreign policy, the primary policy area of the "Resistance" op-ed:
  1. Fiona Hill
  2. Andrew Wheeler
  3. Mike Pence
  4. Rick Perry
  5. Nikki Haley
  6. Jared Kushner
  7. Ryan Zinke
  8. John Bolton
  9. Mike Pompeo
  10. Kevin Hassett
  11. John Sullivan
  12. Linda McMahon
  13. Steven Mnuchin
  14. Betsy DeVos
  15. Jim Mattis
  16. John Kelly
Suppose we then eliminate all candidates who have denied writing the op-ed:
  1. Fiona Hill
  2. Mike Pence
  3. Nikki Haley
  4. Jared Kushner
  5. John Bolton
  6. Mike Pompeo
  7. John Sullivan
  8. Jim Mattis
That leaves just:
  1. Fiona Hill
  2. Jared Kushner
  3. John Sullivan
And that's as far as I want to take this now. I welcome further speculation and remarks in the comments below.

Thursday, November 15, 2018

Authorship of the NYT "Resistance" Op-Ed

In September, The New York Times published "I Am Part of the Resistance Inside the Trump Administration", an incendiary op-ed by an anonymous senior official. Since then, there has been wild speculation and sleuthing about the identify of the author. In this post, I aim to cut through the noise and identify the most likely author using natural language processing and other contextual clues.

First I assembled a list of candidates from WikipediaPredictIt, and a few other informal forums. It covers all the top contenders, though isn't exhaustive due to the sprawl of the executive bureaucracy.

  • Ajit Pai, Chairman of the Federal Communications Commission
  • Alex Azar, Secretary of Health and Human Services
  • Alexander Acosta, Secretary of Labor
  • Andrew Wheeler, Acting Administrator of the Environmental Protection Agency
  • Ben Carson, Secretary of Housing and Urban Development
  • Betsy DeVos, Secretary of Education
  • Christopher Wray, Director of the Federal Bureau of Investigation
  • Dan Coats, Director of National Intelligence
  • Don McGahn, White House Counsel
  • Elaine Chao, Secretary of Transportation
  • Fiona Hill, Senior Director for European and Russian Affairs
  • Gina Haspel, Director of the CIA
  • Ivanka Trump, Senior Adviser to the President
  • Jim Mattis, Secretary of Defense
  • Jared Kushner, Senior Adviser to the President
  • Jeff Sessions, Attorney General
  • John Sullivan, Deputy Secretary of State
  • John Kelly, White House Chief of Staff
  • John Bolton, National Security Advisor
  • Jon Huntsman, Ambassador to Russia
  • Joseph Simons, Chairman of the Federal Trade Commission
  • Kellyanne Conway, Presidential Counselor
  • Kevin Hassett, Chairman of the Council of Economic Advisors
  • Kirstjen Nielsen, Secretary of Homeland Security
  • Larry Kudlow, Director of the National Economic Council
  • Linda McMahon, Administrator of the Small Business Administration
  • Melania Trump, First Lady of the United States
  • Mick Mulvaney, Director of the Office of Management and Budget
  • Mike Pence, Vice President of the United States
  • Mike Pompeo, Secretary of State
  • Nikki Haley, Ambassador to the United Nations
  • Raj Shah, White House Principal Deputy Press Secretary
  • Rick Perry, Secretary of Energy
  • Robert Lighthizer, United States Trade Representative
  • Robert Wilkie, Secretary of Veterans Affairs
  • Ryan Zinke, Secretary of the Interior
  • Sarah Sanders, White House Press Secretary
  • Sonny Perdue, Secretary of Agriculture
  • Stephen Miller, Senior Policy Advisor
  • Steven Mnuchin, Secretary of the Treasury
  • Wilbur Ross, Secretary of Commerce
  • Zachary Fuentes, Deputy Chief of Staff

For each candidate, I scraped around 2,000 to 10,000 raw words of their writing, attempting to use material that was maximally similar to the op-ed in subject, setting, and formality. For a few candidates, this was not possible as their body of published writing was too small or informal to make a useful comparison:

  • Christopher Wray, Director of the Federal Bureau of Investigation
  • Don McGahn, White House Counsel
  • Sarah Sanders, White House Press Secretary
  • Stephen Miller, Senior Policy Advisor
  • Zachary Fuentes, Deputy Chief of Staff

I then broke this raw text into sentence fragments using a series of regular expressions, removing numbers, quotes, and proper nouns. The intention was to make the text maximally subject-independent to focus more on writing style/voice. Here's an example from Dan Coats:

"Since 1998 when this terrorist payments program reportedly first began, the United States has contributed more than $4.6 billion (in constant dollars) to the PA. The great majority of this amount has been in straight budgetary support to the PA, enabling it to meet its budgetary commitments."

This is parsed as:

when this terrorist payments program reportedly first began
has contributed more than
in constant dollars
to the
the great majority of this amount has been in straight budgetary support to the
enabling it to meet its budgetary commitments

After parsing all the text, I extracted a dictionary of all unique words across all text, including the anonymous op-ed. Then, I generated Markov matrices to track word transition probabilities, i.e. Mrkv(word_x, word_y) gives the probability that word_y will follow word_x in a sentence fragment. I also extracted at the overall distribution of words, i.e. Dist(word_x) gives the frequency of word_x in the author's overall writing. Both Mrkv and Dist are created using "hit counts" and then normalized to the word count, and so their values represent probabilities.

With the Markov matrices and word distributions known, I took the overall "similarity" score as:

similarity = -sum(abs(Candidate - Op_Ed))

A similarity score is generated for both Mrkv and Dist; Markov similarity gives a feel for how words/patterns flow together, while distribution similarity gives a feel for word choice. Here are the results:

Notably, Hill scores highest in both similarity metrics. However, before jumping to conclusions, note that although Mrkv and Dist are normalized to word counts, there is an additional correlation between word count and similarity. A typical op-ed is ~1,000 words, so theoretically could yield 1,000 word combinations (neglecting grammar). Compare this to the 10,000-word dictionary, which yields 100,000,000 combinations. Our language is too rich to "exhaust" a person's speech in even 10,000 words because unique combinations are so plentiful. In this case, the more text provided for a candidate, the higher they score in similarity, because they have more opportunity to stumble upon a rare combination matching the op-ed. So an additional normalization is necessary.

To account for this, I re-scored the Markov and distribution similarities as the vertical distance off the linear fit line, and re-evaluated. The intent was to make results maximally dependent on style, not length.

Only two candidates rank in the top tier for both similarity metrics: Fiona Hill (#1, #2) and John Bolton (#3, #1). Hill's case is stronger from the linguistic perspective, and the gap widens further upon considering the additional circumstantial evidence.

Hill is one of only six people on this shortlist to have not denied writing the op-ed. (The others are Ivanka, Kushner, Sullivan, Kelly, and Miller. Of this set, Ivanka, Kushner, and Kelly appear to be exonerated by the similarity metrics; John Sullivan is toward the upper end of the similarity spectrum, but not suspiciously so; Miller could not be analyzed as he is unpublished.) Hill's background is in academia and policy, and her most-recent publications, shown below, revolve around Russia and Putin, both of which are notably mentioned multiple times in the op-ed that is otherwise light on specifics.

  • “What makes Putin tick and what the West should do,” Fiona Hill and Clifford Gaddy,Brookings Report (online), January 13, 2017
  • “Dealing with a simmering Ukraine-Russia conflict,” Fiona Hill and Steven Pifer, Brookings
  • “Election 2016 and America’s Future” series (online), October 6, 2016
  • “3 reasons Russia’s Vladimir Putin might want to interfere in the US presidential elections,” Vox (online), July 27, 2016
  • “Putin: The one-man show the West doesn’t understand,” Bulletin of the Atomic Scientists, April 13, 2016
  • “Understanding and deterring Russia: U.S. policies and strategies,” Testimony to the House Armed Services Committee, February 10, 2016

Bolton has denied authorship, which decreases his likelihood though doesn't rule him out. He has a long history of military and government involvement and is known for being outspoken in his controversial views, making him further less likely to carry out this secretive act.

So, between the linguistic similarities, lack of denial, connection to the subject matter, and personality cues, all signs point to Hill. What about the possibility of the op-ed being written in the voice of Hill by a third party for some ulterior purpose? To some extent we can lean on the New York Times' credibility and rule out this possibility; presumably the editors verified the author's identity before publication. It's remotely possible that a verifiable official imitated Hill in writing the op-ed, perhaps to further hide their identity. However, due to the sophistication required for this task, it seems very unlikely.

It's not clear what the author hoped to achieve with this op-ed, or how they expected to remain anonymous from a position of national visibility in the age of Twitter and instant, continuous press circulation. Either way, I expect we will eventually know the author's identity, and I expect it will be...Fiona Hill.


Despite widespread interest, there seems to be little actual data-driven analysis toward answering the authorship question. Here's what other authors have found with similar approaches:

  • BBC suggests the author is Mike Pence based on "the author's stylistic traits"
  • David Robinson analyzed tweets, finding D. Trump, Sanders, and Pence most likely
  • Max Berggren analyzed tweets, finding DeVos, Carson, and Azar most likely

And for additional qualitative commentary:

  • Claire Hardaker concludes: "I don’t think forensic linguistics can solve this one"
  • The Washington Post, conversely, writes: "The outing of the op-ed’s author is virtually inevitable, according to forensic linguists"

Source code and text data are available on my GitHub.

Tuesday, October 16, 2018

Veganism and Energy Efficiency


In this post, I discuss how veganism relates to energy efficiency, evaluate a few specific livestock common in the U.S., and conclude that factory farming techniques are significantly wasteful of food energy. In closing, I suggest some ideas for how people can adjust their diets to be more environmentally sustainable, especially if they are not interested in veganism.


I've been vegan since May 2017. I made the switch for a few reasons:

  • My brother adopted veganism about a year prior, and convinced me that the diet could be as healthy, or healthier, compared to an omnivorous diet, overturning my misconception that vegans were inherently nutritionally deficient.
  • A growing body of informal and academic work supports the idea that veganism is more environmentally sustainable that omnivorism.

Now about 1.5 years later, I've noticed many benefits: increased speed, stamina, energy, and near-elimination of common colds. Contrary to popular misconceptions, my food spending has decreased slightly (despite inflation), and I've naturally maintained my pre-veganism weight while becoming leaner and more muscular with increased exercise.

Effect of veganism on my combined grocery + restaurant spending

My interest in veganism is primarily as it relates to environmental sustainability, and I've read a goo
d deal on the topic. Though the big-picture argument is cohesive and compelling, much of the writing on it is lacking in technical rigor. Here are some typical examples:
"Eating lower down on the food chain provides a massive savings in terms of how much energy and resources you need. If you're on the third trophic level and you eat herbivores, the animals you eat contain only 10 percent of the energy originally stored by the plants they consumed...[In] general, eating lower on the food chain is always a more efficient practice."

Seems reasonable, but herbivores can digest food sources that humans can't, like grass and stovers, so not all food is equally valuable to all trophic levels. The food chain is more like a web rather than a neatly-ordered chain. Another example:

"It takes more than 2,400 gallons of water to produce 1 pound of meat; 1 pound of wheat takes 25 gallons."

Maybe so, but pound-for-pound, meat is much more nutritious and energetic (i.e. higher in calories) than wheat, so maybe the additional water is a good investment? From this statistic alone there's not enough information to make a meaningful conclusion.

A Litmus Test for Veganism

Unsatisfied with writing on veganism, I sought to develop my own metric: a litmus test to gauge the efficacy of veganism in improving environmental sustainability. The food chain starts with plants, and consequently all other trophic levels depend on plants. Growing plants is non-negotiable. So in effect, a referendum on veganism is a referendum on livestock cultivation. Should we do it, or not?

In an abstract sense, livestock can be viewed as energy conversion machines that convert plant products like corn and hay to animal products like meat and eggs. The figure of merit for an energy conversion machine is its efficiency - that is, its output divided by its input. The outputs are straightforward - meat, milk, and eggs. The inputs are the animal's diet - typically, some combination of corn, grains, legumes, and forages (i.e. grass, hay, stovers, etc). This was my starting point that I refined as I learned more about the modern food supply system.

Discourse on environmental impacts of mass agriculture tends to focus on land/water use, carbon emissions, and pollution, and these are all important considerations. However, I decided to focus on energy efficiency instead, for a few reasons:
  1. To clarify a perceived gap: pro-vegan sources usually ignore the fact that livestock are significant consumers of agricultural waste with digestive systems that are fundamentally different from humans', thereby underestimating their practical efficiency.
  2. To reduce the problem to its most essential form: energy is arguably the most fundamental agricultural resource, as it can be traded for water (e.g. through desalination), land (e.g. through terracing), carbon emissions (e.g. through sequestration), or pollution cleanup.

Effect of Diet on Energy Efficiency

All energy transfers are less than 100% efficient, so eliminating unnecessary intermediate trophic levels, and their associated energy transfers, increases overall energy efficiency. Cultivating the world's staple crops like wheat, corn, beans, and rice produces a great deal of agricultural waste that humans can't digest, but are readily digested by ruminant livestock. As there are no intermediate trophic levels between livestock and forages, and no other uses for forages, it seems reasonable that they be fed for livestock.

However, in the factory farming model which dominates U.S. agriculture, forages are a secondary food source for modern livestock, with the primary food source being, most typically in the U.S., corn. These reasons for this are complex, but include heavy tax subsidies for corn, high meat demand, and consumer preference for marbling in meat. Further, non-ruminant livestock consume no forages at all.

From an energy-efficiency perspective, this is sub-optimal as corn and other livestock feed staples are human-edible. In economic terms, feeding human-edible food to livestock is essentially an "investment" that only makes sense if the payout in animal products exceeds the initial investment. In this formulation, forages are excluded from the "investment", since they are not human-edible and have no human value. To quantify this exchange, I developed a metric called "Return on Useful Energy Input" (RUEI), which is calculated over a livestock's lifetime by:

RUEI = (E_ou
tput, all sources - E_input, human-edible sources) / E_input, human-edible sources

By design, RUEI has som
e useful properties: it's a dimensionless percentage, enabling direct comparison of disparate livestock and diets, and it's meaningfully signed: positive RUEI indicates a net gain, and negative RUEI a net loss. Unlike thermodynamic efficiency, which is always 0-100%, RUEI can be negative or exceed 100%, since it is not an efficiency but rather a normalized return.

World Livestock Overview

RUEI indicates if any given livestock is a good idea or a bad idea from an energy-efficiency perspective. Which livestock to analyze? Here's what the global herd looks like by population:

And by total biomass:

To limit sc
ope, I restricted my analysis to three animals with sufficient data available and that are of primary importance in the U.S.: cows, pigs, and chickens.

Assumptions and Limitations

Before proceeding, some notes on my simplifying assumptions used to calculate RUEI:
  • Feed ingredient proportions do not change with time 
  • Feed amount is proportional to animal mass 
  • Animal mass increases linearly with time up to mature mass, after which it is constant 
  • Animal products not included in the dressing percentage yield no food 
  • Forages and distiller's grains are considered human-inedible and are not included in RUEI 
  • All other food sources are considered human-edible and are included in RUEI 

And a few notes on overall scope:
  • Other environmental impacts like carbon emissions, land usage, water usage, and pollution are neglected to focus on energy efficiency 
  • Nutrition content, aside from calories, is neglected 
  • Energy costs upstream and downstream of livestock are neglected; system boundaries are drawn strictly around the livestock only


Cows b
reeds are specialized for either milk or beef production. Both cases are analyzed below.

Dairy cows

A typical dairy cow is first mated or inseminated at about 13 months old. After a 9 month gestation, the cow gives birth and produces milk for about 10 months, the "calving period". On average a cow has about 1.7 calving periods before it is no longer profitable, after which it is slaughtered. Over its lifetime, a dairy cow produces about 3,300 gallons of milk and yields about 400 kg of beef when slaughtered, with a dressing percentage of 60%. I considered two diets describing TMRs (total mixed rations), the daily food allotted to each cow, found on farm blogs:

IngedientMass, lb
Alfalfa hay28.00
Corn stover35.00
Cotton seed6.00
Oat flour4.50
source: Dairy Carrie: "What Do Cows Eat"

This diet is representative of more-traditional, non-factory-style feeds, with forages as the primary feed and grains secondary. For obvious reasons, factory farms do not publish their feed recipes as transparently as small farms. The energy breakdown over the cow's lifetime, in round numbers, is:
  • Milk output: 9,000,000 Cal
  • Meat output: 1,000,000 Cal
  • Feed input, all sources: 198,000,000 Cal
  • Feed input, human-edible sources: 43,000,000 Cal
As can be seen from the data, the cow produces significantly less useful energy than it consumes, even after accounting for its forages. The RUEI is:

RUEI = (E_output, all sources - E_input, human-edible sources) / E_input, human-edible sources
RUEI = ((9,000,000+1,000,000) - 43,000,000) / 43,000,000
RUEI: -77%

In other words, the cow is a net useful energy loss; the return is -77% of the useful input, i.e. an input of 100 useful plant calories yields only 23 animal calories. Repeating the same analysis for an alternate diet from a similar-size farm:

IngredientMass, lb
Alfalfa hay12.00
Canola meal2.00
Corn stover62.00
Cotton seed4.00
Soybean meal3.00
Wheat stover12.00
source: Slow Money Farm: "What Do Cows Eat"

RUEI: -68%

Beef cows

Data on specific beef cow diets proved difficult to find, so I re-used the diet information for the dairy cows. Beef cows produce no milk and yield a similar amount of meat, though their lives are much shorter at around 18 months.

RUEI: -88%


Pigs are slaughtered at around 4 months old, with a dressing percentage of 72%, and consume about 2.6% of their body weight daily. Most pigs are fed only corn, soybean meal, and sometimes wheat. Specific pig diets were found abundantly, and seven specific diets were analyzed. Representative highlights are shown below:

"[Baseline] reference diet"
IngredientMass, lb
Soybean meal1.51
source: Niche Pork Production: "Example Pig Diets"

RUEI: -27%

"Example [diet] including cooked, full-fat soybeans"
IngredientMass, lb
Soybean meal0.37

RUEI: -7%

"High-forage diet for gestating sows"
IngredientMass, lb
Soybean meal0.31
Alfalfa hay4.81

RUEI: 130%

This is the only diet found to have a positive RUEI, owing to it being based on forages. Feed varies significantly between farms, and it's unclear how common this type of feed is, but it does suggest an interesting gray area that will be discussed further.


Like cows, chickens breeds are specialized for either egg or meat production, with the latter termed "broilers". A laying chicken typically lives to be 18 months old and produces around 170 eggs, being slaughtered at a weight of around 2.2 kg, with a dressing percentage of around 70%. Broilers live to be about 1.5 months old, and are slaughtered at around 2.0 kg, though some broilers can reach as much as twice this weight. Both types of birds eat about 5% of their weight daily in the form of corn and soybeans. Chickens are not significant consumers of agricultural waste products.

Laying chickens

"Corn-soy-based diet (CS)"
IngredientMass, lb
Soybean meal0.09
Soybean oil0.01
source: Poultry Science: "Energy and nutrient utilization of broiler chickens..."

A chicken's egg production constitutes the majority of its food output, about 73%.

RUEI: -87%

"Corn-based diet (CN)"
IngredientMass, lb
Soybean meal0.05
Soybean oil0.01

RUEI: -83%

Broiler chickens

Applying the same diet proportions to broilers, noting that their daily food intake is slightly higher:

RUEI: -32% to -13%


This figure shows the food outputs of the livestock analyzed, broken down by the output type in terms of days of food, assuming a typical 2,000 calorie/day diet. Incredibly, a single dairy cow produces enough milk in its brief life for about a decade's worth of human caloric intake.

Feed varies significantly by livestock, farm size, and geographic region, and is also usually proprietary, so these figures cannot be precise. Nevertheless, they capture some important trends about the extent to which livestock are recyclers or consumers of food products. All of these livestock originally existed near 100% on this chart, consuming mostly scraps and forages, but modern farming techniques have dramatically pushed the numbers lower by replacing large portions of feed with grains and legumes.

RUEI was negative for all livestock analyzed except forage-fed pigs, which consume mostly alfalfa hay or corn stovers. The other livestock shown could also be raised in a positive-RUEI manner, but this is at odds with current factory farming and consumption habits.


The data supports the view that factory farming, with its livestock feed characterized by low forage and high human-edible feed, is environmentally unsustainable, or at minimum, energetically wasteful. Mainstream methods for cultivation of cows, pigs, and chickens in America cause a net useful energy loss, even when considering the fact that some animals consume agricultural waste that is human-inedible.

However, an interesting gray area is found in the case of high-forage diets characteristic of free-range agriculture. Further, within this framework, livestock that consume human-inedible/waste products exclusively would have an infinitely high RUEI. Interestingly, this aligns with the pre-industrial model of farming, before factory farming, when crop cultivation was the primary focus and livestock existed in a secondary capacity, grazing freely, consuming occasional food waste, and in some cases serving as labor. In these times, meat was rare and weighty, a far cry from its commodity status in modern America.

We are unlikely to ever return to this idyllic past, but we would be well-served to learn from its wisdom. Those who are not prepared to give up meat should at least better understand its environmental impact and reduce consumption, while opting for local, free-range, forage-fed options whenever possible. Veganism is too extreme for most people, but that's okay. In practice, it's far more beneficial, and socially conscientious, to reduce livestock consumption and/or improve cultivation standards than trying to convert non-vegans, which is usually a losing battle and risks stigmatizing the movement. For example, reducing the meat consumption of 10 people by 10% is far better and easier than reducing the meat consumption of 1 person by 100%, since the ask is more reasonable while the impact is equal yet more socially broad. Purism should not stand in the way of pragmatism.


Those interested in the data, calculations, and sources may view my full spreadsheet, the basis of this post. I welcome any corrections and criticisms.

Books that shaped my general thinking on this topic are "Who Will Feed China?" by Lester R. Brown and "America's Food" by Harvey Blatt.

Update 2018-10-24

The Economist ran an article last week called "Why people in rich countries are eating more vegan food", which includes a nice graph showing "feed to food loss" for various livestock. They use protein as their intermediate quantity to compare between crops and livestock instead of energy as I've done here, but the concept and conclusions are similar.

Graph from The Economist comparing protein gains and losses for crops vs. livestock