An Independent Project at Phillips Academy

Russian Meddling Pt. 2

Inside the Mind of the Troll

Using Machine Learning to Understand Russian Disinformation

JSON tweets collected about the midterms.Shown above are JSON-encoded data of Tweets related to the 2018 midterm elections in the United States. Photo: Miles McCain/Twitter API
Miles McCain and Jeffrey ShenNov. 7, 2018

Over the course of the 2016 election and beyond, the Russian government conducted a massive disinformation campaign to divide the American public, sow distrust, and influence electoral outcomes. In October 2018, Twitter released a comprehensive dataset of more than 9 million Russian troll Tweets. Today, we’re proud to release an interactive machine learning model that helps make the Russian disinformation campaign just slightly more accessible.


Given any English Tweet, our model determines whether that Tweet uses language specifically popular among Russian trolls—or whether the Tweet more closely resembles organic content. And while the model can distinguish between a representative sample of organic Tweets and Russian Twitter disinformation with more than 90% accuracy, it isn’t meant to uncover hidden Russian agents online; instead, it’s an educational tool designed to shed light on the type of content disproportionately conveyed by the Russian trolls. To try out the model yourself, tag @TrackTheTrolls in response to a Tweet, or use the tool below.


Please enter your text below

Unfortunately, the troll content analyzer has been archived due to financial constraints and is no longer operational.
0/240
All percentages shown in the tool above indicate the scaled confidence (from -100% to +100%) of the model that a text is theoretically a Russian IRA troll Tweet (positive values) or an organic Tweet (negative values) in an evenly distributed dataset.

Limitations and caveats—

Our scikit-learn-powered model uses an explainable machine learning algorithm to weigh the similarities of any given text to Russian troll content versus organic posts. The model helps to expose language that is specifically popular among trolls in an accessible and interactive manner. Our model does not detect yet-undiscovered trolls online, nor would it be able to. Instead, it’s an educational tool designed to detect patterns in troll language—nothing more, nothing less.


Here are a few caveats to keep in mind when playing with our model:



How it works—

To analyze a Tweet, our model passes the Tweet’s text through a complex—but explainable—process. Following the tried-and-true method of using a scikit-learn Naive Bayes classifier for text analysis in Python, we assembled our model using a Count Vectorizer, TF-IDF preprocessor, and finally a multinomial Naive Bayes classifier. For the more technically inclined, here’s the core architecture of our model:


That’s right—with scikit-learn, it only takes three lines of code to define the model’s skeleton! While we won’t detail the ‘glue code’ that holds everything together here, the entire project is open source on GitHub. (For information about the data on which we trained our model, see ‘Data sourcing’ below.)


Here’s how the machine learning model works, in plain English:


  1. First, the model splits the text into its parts (words), called tokens. It counts the total number of times each token appears in the text, and then returns this value as a vector. In our model, this is performed by the industry-standard scikit-learn CountVectorizer.
  2. Then, the model compares the relative frequency of each token to its relative frequency in the training datasets. This helps identify the important words in a sentence while filtering out unimportant words (such as “and”, “to be”, and “a.”) This process, called TF-IDF, is performed by scikit-learn’s TfidfTransformer.
  3. Finally, our model analyzes the relative token frequencies created by the CountVectorizer and the TfidfTransformer to determine whether the model uses language that is specifically popular among organic or Russian IRA troll content. This statistical analysis is performed by scikit-learn’s MultinomialNB classifier.

Data sourcing and software libraries—

Our model is built from a corpus of a corpus of 4.6 million English Tweets, split equally between a random sample of Russian IRA troll tweets released by Twitter and a representative collection of English Tweets collected over a two-week period in October 2018. Because Twitter’s Terms of Service prohibit the distribution of datasets that include Tweet content, we cannot open-source the entirety of our training data. (If you’re a researcher looking to improve on or reproduce our model, we’re happy to share the data with you individually. Contact us here.)


Like nearly all data science projects, we employed data science libraries to assemble and train our model. Our work would not have been possible without the following two software libraries.



Are you interested in using our model in your own software? We built an API to allow you to do just that! Send a HTTP GET request to https://ru.dpccdn.net/analyze/<your URL-encoded text> and you’ll receive a full JSON response. No authentication is required.


What’s next—

To conclude our inquiry into Russian digital propaganda, we will apply our machine learning model ‘in the wild.’ We’ll look at whether Donald Trump’s Tweets incorporate language specifically popular among Russian IRA trolls, and whether 2018 election-related discourse on Twitter has, by this same metric, changed since the 2016 election.


Our ultimate objective is to help spread awareness of the Russian government’s ongoing disinformation campaign in the United States. To this end, we invite you to experiment with the analyzer provided above, or to tag @TrackTheTrolls in reply to a Tweet.