Thursday, November 15, 2018

Authorship of the NYT "Resistance" Op-Ed

In September, The New York Times published "I Am Part of the Resistance Inside the Trump Administration", an incendiary op-ed by an anonymous senior official. Since then, there has been wild speculation and sleuthing about the identify of the author. In this post, I aim to cut through the noise and identify the most likely author using natural language processing and other contextual clues.

First I assembled a list of candidates from WikipediaPredictIt, and a few other informal forums. It covers all the top contenders, though isn't exhaustive due to the sprawl of the executive bureaucracy.


  • Ajit Pai, Chairman of the Federal Communications Commission
  • Alex Azar, Secretary of Health and Human Services
  • Alexander Acosta, Secretary of Labor
  • Andrew Wheeler, Acting Administrator of the Environmental Protection Agency
  • Ben Carson, Secretary of Housing and Urban Development
  • Betsy DeVos, Secretary of Education
  • Christopher Wray, Director of the Federal Bureau of Investigation
  • Dan Coats, Director of National Intelligence
  • Don McGahn, White House Counsel
  • Elaine Chao, Secretary of Transportation
  • Fiona Hill, Senior Director for European and Russian Affairs
  • Gina Haspel, Director of the CIA
  • Ivanka Trump, Senior Adviser to the President
  • Jim Mattis, Secretary of Defense
  • Jared Kushner, Senior Adviser to the President
  • Jeff Sessions, Attorney General
  • John Sullivan, Deputy Secretary of State
  • John Kelly, White House Chief of Staff
  • John Bolton, National Security Advisor
  • Jon Huntsman, Ambassador to Russia
  • Joseph Simons, Chairman of the Federal Trade Commission
  • Kellyanne Conway, Presidential Counselor
  • Kevin Hassett, Chairman of the Council of Economic Advisors
  • Kirstjen Nielsen, Secretary of Homeland Security
  • Larry Kudlow, Director of the National Economic Council
  • Linda McMahon, Administrator of the Small Business Administration
  • Melania Trump, First Lady of the United States
  • Mick Mulvaney, Director of the Office of Management and Budget
  • Mike Pence, Vice President of the United States
  • Mike Pompeo, Secretary of State
  • Nikki Haley, Ambassador to the United Nations
  • Raj Shah, White House Principal Deputy Press Secretary
  • Rick Perry, Secretary of Energy
  • Robert Lighthizer, United States Trade Representative
  • Robert Wilkie, Secretary of Veterans Affairs
  • Ryan Zinke, Secretary of the Interior
  • Sarah Sanders, White House Press Secretary
  • Sonny Perdue, Secretary of Agriculture
  • Stephen Miller, Senior Policy Advisor
  • Steven Mnuchin, Secretary of the Treasury
  • Wilbur Ross, Secretary of Commerce
  • Zachary Fuentes, Deputy Chief of Staff

For each candidate, I scraped around 2,000 to 10,000 raw words of their writing, attempting to use material that was maximally similar to the op-ed in subject, setting, and formality. For a few candidates, this was not possible as their body of published writing was too small or informal to make a useful comparison:

  • Christopher Wray, Director of the Federal Bureau of Investigation
  • Don McGahn, White House Counsel
  • Sarah Sanders, White House Press Secretary
  • Stephen Miller, Senior Policy Advisor
  • Zachary Fuentes, Deputy Chief of Staff

I then broke this raw text into sentence fragments using a series of regular expressions, removing numbers, quotes, and proper nouns. The intention was to make the text maximally subject-independent to focus more on writing style/voice. Here's an example from Dan Coats:

"Since 1998 when this terrorist payments program reportedly first began, the United States has contributed more than $4.6 billion (in constant dollars) to the PA. The great majority of this amount has been in straight budgetary support to the PA, enabling it to meet its budgetary commitments."

This is parsed as:

since
when this terrorist payments program reportedly first began
the
has contributed more than
billion
in constant dollars
to the
the great majority of this amount has been in straight budgetary support to the
enabling it to meet its budgetary commitments

After parsing all the text, I extracted a dictionary of all unique words across all text, including the anonymous op-ed. Then, I generated Markov matrices to track word transition probabilities, i.e. Mrkv(word_x, word_y) gives the probability that word_y will follow word_x in a sentence fragment. I also extracted at the overall distribution of words, i.e. Dist(word_x) gives the frequency of word_x in the author's overall writing. Both Mrkv and Dist are created using "hit counts" and then normalized to the word count, and so their values represent probabilities.

With the Markov matrices and word distributions known, I took the overall "similarity" score as:

similarity = -sum(abs(Candidate - Op_Ed))

A similarity score is generated for both Mrkv and Dist; Markov similarity gives a feel for how words/patterns flow together, while distribution similarity gives a feel for word choice. Here are the results:



Notably, Hill scores highest in both similarity metrics. However, before jumping to conclusions, note that although Mrkv and Dist are normalized to word counts, there is an additional correlation between word count and similarity. A typical op-ed is ~1,000 words, so theoretically could yield 1,000 word combinations (neglecting grammar). Compare this to the 10,000-word dictionary, which yields 100,000,000 combinations. Our language is too rich to "exhaust" a person's speech in even 10,000 words because unique combinations are so plentiful. In this case, the more text provided for a candidate, the higher they score in similarity, because they have more opportunity to stumble upon a rare combination matching the op-ed. So an additional normalization is necessary.



To account for this, I re-scored the Markov and distribution similarities as the vertical distance off the linear fit line, and re-evaluated. The intent was to make results maximally dependent on style, not length.



Only two candidates rank in the top tier for both similarity metrics: Fiona Hill (#1, #2) and John Bolton (#3, #1). Hill's case is stronger from the linguistic perspective, and the gap widens further upon considering the additional circumstantial evidence.

Hill is one of only six people on this shortlist to have not denied writing the op-ed. (The others are Ivanka, Kushner, Sullivan, Kelly, and Miller. Of this set, Ivanka, Kushner, and Kelly appear to be exonerated by the similarity metrics; John Sullivan is toward the upper end of the similarity spectrum, but not suspiciously so; Miller could not be analyzed as he is unpublished.) Hill's background is in academia and policy, and her most-recent publications, shown below, revolve around Russia and Putin, both of which are notably mentioned multiple times in the op-ed that is otherwise light on specifics.

  • “What makes Putin tick and what the West should do,” Fiona Hill and Clifford Gaddy,Brookings Report (online), January 13, 2017
  • “Dealing with a simmering Ukraine-Russia conflict,” Fiona Hill and Steven Pifer, Brookings
  • “Election 2016 and America’s Future” series (online), October 6, 2016
  • “3 reasons Russia’s Vladimir Putin might want to interfere in the US presidential elections,” Vox (online), July 27, 2016
  • “Putin: The one-man show the West doesn’t understand,” Bulletin of the Atomic Scientists, April 13, 2016
  • “Understanding and deterring Russia: U.S. policies and strategies,” Testimony to the House Armed Services Committee, February 10, 2016

Bolton has denied authorship, which decreases his likelihood though doesn't rule him out. He has a long history of military and government involvement and is known for being outspoken in his controversial views, making him further less likely to carry out this secretive act.

So, between the linguistic similarities, lack of denial, connection to the subject matter, and personality cues, all signs point to Hill. What about the possibility of the op-ed being written in the voice of Hill by a third party for some ulterior purpose? To some extent we can lean on the New York Times' credibility and rule out this possibility; presumably the editors verified the author's identity before publication. It's remotely possible that a verifiable official imitated Hill in writing the op-ed, perhaps to further hide their identity. However, due to the sophistication required for this task, it seems very unlikely.

It's not clear what the author hoped to achieve with this op-ed, or how they expected to remain anonymous from a position of national visibility in the age of Twitter and instant, continuous press circulation. Either way, I expect we will eventually know the author's identity, and I expect it will be...Fiona Hill.

Footnotes

Despite widespread interest, there seems to be little actual data-driven analysis toward answering the authorship question. Here's what other authors have found with similar approaches:


  • BBC suggests the author is Mike Pence based on "the author's stylistic traits"
  • David Robinson analyzed tweets, finding D. Trump, Sanders, and Pence most likely
  • Max Berggren analyzed tweets, finding DeVos, Carson, and Azar most likely

And for additional qualitative commentary:

  • Claire Hardaker concludes: "I don’t think forensic linguistics can solve this one"
  • The Washington Post, conversely, writes: "The outing of the op-ed’s author is virtually inevitable, according to forensic linguists"

Source code and text data are available on my GitHub.

No comments:

Post a Comment