First I assembled a list of candidates from Wikipedia, PredictIt, and a few other informal forums. It covers all the top contenders, though isn't exhaustive due to the sprawl of the executive bureaucracy.
- Ajit Pai, Chairman of the Federal Communications Commission
- Alex Azar, Secretary of Health and Human Services
- Alexander Acosta, Secretary of Labor
- Andrew Wheeler, Acting Administrator of the Environmental Protection Agency
- Ben Carson, Secretary of Housing and Urban Development
- Betsy DeVos, Secretary of Education
- Christopher Wray, Director of the Federal Bureau of Investigation
- Dan Coats, Director of National Intelligence
- Don McGahn, White House Counsel
- Elaine Chao, Secretary of Transportation
- Fiona Hill, Senior Director for European and Russian Affairs
- Gina Haspel, Director of the CIA
- Ivanka Trump, Senior Adviser to the President
- Jim Mattis, Secretary of Defense
- Jared Kushner, Senior Adviser to the President
- Jeff Sessions, Attorney General
- John Sullivan, Deputy Secretary of State
- John Kelly, White House Chief of Staff
- John Bolton, National Security Advisor
- Jon Huntsman, Ambassador to Russia
- Joseph Simons, Chairman of the Federal Trade Commission
- Kellyanne Conway, Presidential Counselor
- Kevin Hassett, Chairman of the Council of Economic Advisors
- Kirstjen Nielsen, Secretary of Homeland Security
- Larry Kudlow, Director of the National Economic Council
- Linda McMahon, Administrator of the Small Business Administration
- Melania Trump, First Lady of the United States
- Mick Mulvaney, Director of the Office of Management and Budget
- Mike Pence, Vice President of the United States
- Mike Pompeo, Secretary of State
- Nikki Haley, Ambassador to the United Nations
- Raj Shah, White House Principal Deputy Press Secretary
- Rick Perry, Secretary of Energy
- Robert Lighthizer, United States Trade Representative
- Robert Wilkie, Secretary of Veterans Affairs
- Ryan Zinke, Secretary of the Interior
- Sarah Sanders, White House Press Secretary
- Sonny Perdue, Secretary of Agriculture
- Stephen Miller, Senior Policy Advisor
- Steven Mnuchin, Secretary of the Treasury
- Wilbur Ross, Secretary of Commerce
- Zachary Fuentes, Deputy Chief of Staff
For each candidate, I scraped around 2,000 to 10,000 raw words of their writing, attempting to use material that was maximally similar to the op-ed in subject, setting, and formality. For a few candidates, this was not possible as their body of published writing was too small or informal to make a useful comparison:
- Christopher Wray, Director of the Federal Bureau of Investigation
- Don McGahn, White House Counsel
- Sarah Sanders, White House Press Secretary
- Stephen Miller, Senior Policy Advisor
- Zachary Fuentes, Deputy Chief of Staff
I then broke this raw text into sentence fragments using a series of regular expressions, removing numbers, quotes, and proper nouns. The intention was to make the text maximally subject-independent to focus more on writing style/voice. Here's an example from Dan Coats:
"Since 1998 when this terrorist payments program reportedly first began, the United States has contributed more than $4.6 billion (in constant dollars) to the PA. The great majority of this amount has been in straight budgetary support to the PA, enabling it to meet its budgetary commitments."
This is parsed as:
since
when this terrorist payments program reportedly first began
the
has contributed more than
billion
in constant dollars
to the
the great majority of this amount has been in straight budgetary support to the
enabling it to meet its budgetary commitments
After parsing all the text, I extracted a dictionary of all unique words across all text, including the anonymous op-ed. Then, I generated Markov matrices to track word transition probabilities, i.e. Mrkv(word_x, word_y) gives the probability that word_y will follow word_x in a sentence fragment. I also extracted at the overall distribution of words, i.e. Dist(word_x) gives the frequency of word_x in the author's overall writing. Both Mrkv and Dist are created using "hit counts" and then normalized to the word count, and so their values represent probabilities.
With the Markov matrices and word distributions known, I took the overall "similarity" score as:
similarity = -sum(abs(Candidate - Op_Ed))
A similarity score is generated for both Mrkv and Dist; Markov similarity gives a feel for how words/patterns flow together, while distribution similarity gives a feel for word choice. Here are the results:
To account for this, I re-scored the Markov and distribution similarities as the vertical distance off the linear fit line, and re-evaluated. The intent was to make results maximally dependent on style, not length.
Only two candidates rank in the top tier for both similarity metrics: Fiona Hill (#1, #2) and John Bolton (#3, #1). Hill's case is stronger from the linguistic perspective, and the gap widens further upon considering the additional circumstantial evidence.
Hill is one of only six people on this shortlist to have not denied writing the op-ed. (The others are Ivanka, Kushner, Sullivan, Kelly, and Miller. Of this set, Ivanka, Kushner, and Kelly appear to be exonerated by the similarity metrics; John Sullivan is toward the upper end of the similarity spectrum, but not suspiciously so; Miller could not be analyzed as he is unpublished.) Hill's background is in academia and policy, and her most-recent publications, shown below, revolve around Russia and Putin, both of which are notably mentioned multiple times in the op-ed that is otherwise light on specifics.
- “What makes Putin tick and what the West should do,” Fiona Hill and Clifford Gaddy,Brookings Report (online), January 13, 2017
- “Dealing with a simmering Ukraine-Russia conflict,” Fiona Hill and Steven Pifer, Brookings
- “Election 2016 and America’s Future” series (online), October 6, 2016
- “3 reasons Russia’s Vladimir Putin might want to interfere in the US presidential elections,” Vox (online), July 27, 2016
- “Putin: The one-man show the West doesn’t understand,” Bulletin of the Atomic Scientists, April 13, 2016
- “Understanding and deterring Russia: U.S. policies and strategies,” Testimony to the House Armed Services Committee, February 10, 2016
Bolton has denied authorship, which decreases his likelihood though doesn't rule him out. He has a long history of military and government involvement and is known for being outspoken in his controversial views, making him further less likely to carry out this secretive act.
So, between the linguistic similarities, lack of denial, connection to the subject matter, and personality cues, all signs point to Hill. What about the possibility of the op-ed being written in the voice of Hill by a third party for some ulterior purpose? To some extent we can lean on the New York Times' credibility and rule out this possibility; presumably the editors verified the author's identity before publication. It's remotely possible that a verifiable official imitated Hill in writing the op-ed, perhaps to further hide their identity. However, due to the sophistication required for this task, it seems very unlikely.
It's not clear what the author hoped to achieve with this op-ed, or how they expected to remain anonymous from a position of national visibility in the age of Twitter and instant, continuous press circulation. Either way, I expect we will eventually know the author's identity, and I expect it will be...Fiona Hill.
Footnotes
Despite widespread interest, there seems to be little actual data-driven analysis toward answering the authorship question. Here's what other authors have found with similar approaches:
And for additional qualitative commentary:
Footnotes
Despite widespread interest, there seems to be little actual data-driven analysis toward answering the authorship question. Here's what other authors have found with similar approaches:
- BBC suggests the author is Mike Pence based on "the author's stylistic traits"
- David Robinson analyzed tweets, finding D. Trump, Sanders, and Pence most likely
- Max Berggren analyzed tweets, finding DeVos, Carson, and Azar most likely
And for additional qualitative commentary:
- Claire Hardaker concludes: "I don’t think forensic linguistics can solve this one"
- The Washington Post, conversely, writes: "The outing of the op-ed’s author is virtually inevitable, according to forensic linguists"
No comments:
Post a Comment