Social media companies struggle to identify and remove hate speech when it's posted. What can computer science reveal about how hate-filled texts and videos spread online?
Just before his shooting spree at two Christchurch, New Zealand mosques, the alleged mass murderer posted a hate-filled manifesto on several file-sharing sites, and emailed the document to at least 30 people, including New Zealand’s prime minister. He also posted on several social media sites links to the manifesto and instructions on how to find his Facebook profile to watch an upcoming video. The video turned out to be a 17-minute Facebook livestream of preparing for and carrying out the first attack on March 15. In his posts, the accused killer urged people to make copies of the manifesto and the video, and share them around the internet.
On March 23, the New Zealand government banned possession and sharing of the manifesto, and shortly thereafter arrested at least two people for having shared the video. By then, the original manifesto document and video file had long since been removed from the platforms where they were first posted. Yet plenty of people appear to have taken the shooter’s advice, making copies and spreading them widely.
As part of my ongoing research into extremism on social media — particularly anti-Muslim sentiment — I was interested in how other right-wing extremists would use the manifesto. Would they know that companies would seek to identify it on their sites and delete it? How would they try to evade that detection, and how would they share the files around the web? I wanted to see if computer science techniques could help me track the documents as they spread. What I learned suggests it may become even harder to fight hate online in the future.
To find as many different versions of the manifesto as possible, I chose an unusual keyphrase, called a “hapax legomenon” in computational linguistics: a set of words that would only be found in the manifesto and nowhere else. For example, Google-searching the phrase “Schtitt uses an unamplified bullhorn” reveals that this phrase is used only in David Foster Wallace’s novel “Infinite Jest” and nowhere else online (until now).
A few minutes of Google-searching for a hapax from the manifesto (which I’m intentionally not revealing) found copies of the document in Microsoft Word and Adobe PDF formats on dozens of file-sharing services, including DocDroid, DocumentCloud, Scribd, Mega and Dropbox. The file had been uploaded to blogs hosted on Wordpress and attached to message boards like Kiwi Farms. I also found numerous broken links to files that had been uploaded and quickly deleted, like the original versions that the author had uploaded to Mediafire and Zippyshare.
To determine whether all the files were the same, I used a common file-identification technique, generating a checksum, or cryptographic hash, for each manifesto document. A hash is a mathematical description of a file. If two files are identical, their hashes will match. If they are different, they will produce different hashes. After reviewing the file hashes, it became clear that there were only a few main versions of the manifesto, and most of the rest of the files circulating around were copies of them.
A hash can only reveal that the files are different, not how or why they are different. Within the different versions of the manifesto files, I found very few instances where entirely new content was added. I did find a few versions that had color graphics and new cover art added, but the text content itself was left largely unchanged. Most of the differences between the originals could be chalked up to the different fonts and paper sizes set as defaults on the computer of whoever created the copies. Some of the versions also had slightly different line spacing, perhaps introduced as the file was converted from Word to PDF.
The video file was another story. At least one person who watched the Facebook video made a copy of it, and that original video was subsequently compressed, edited, restreamed and reformatted until at least 800 different versions were circulating.
Any change to a file — even a small one like adding a single letter to the manifesto or one extra second of video — will result in an entirely different file hash. All those changes made my analysis of the spread of these artifacts difficult — and also complicated social media companies’ efforts to rid the internet of them.
Facebook and YouTube used some form of hash-matching to block most of the video upload attempts. But with all those changes — and the resulting entirely new hashes — 300,000 copies of the video escaped hash-based detection at Facebook. Google also lamented the difficulty of detecting tiny text changes in such a lengthy manifesto.
Despite the internet companies’ claims that these problems will disappear as artificial intelligence matures, a collection of “alt-tech” companies are working to ensure that hate-fueled artifacts like the manifesto and video can spread unbidden.
For example, Rob Monster, CEO of a company called Epik, has created a suite of software services that support a broad collection of hate sites. Epik provides domain services for Gab, an online platform favored by violent extremists like the accused Pittsburgh synagogue shooter, and the company recently acquired BitMitigate, which offers protection against online attacks to neo-Nazi site The Daily Stormer.
Just 24 hours after the mosque attacks, Monster explained on Gab that he shared the manifesto and video file onto IPFS, or the “Interplanetary File System,” a decentralized peer-to-peer file sharing network. Files on IPFS are split into many pieces, each distributed among many participants on the network, making the removal of a file nearly impossible. IPFS had previously been a niche technology, relatively unknown even among extremists. Now, calling IPFS a “crazy clever technology” that makes files “effectively uncensorable,” Monster reassured Gab users that he was also developing software to make IPFS “easy for anyone … with no technical skills required.”
As in-person hate groups were sued into obscurity in the 1980s, extremism went underground. But with the advent of the commercial internet, hate groups quickly moved online, and eventually onto social media. The New Zealand attacker was part of a far-right social media “meme” culture, where angry men (and some women) justify their grievances with violent, hateful rhetoric.
Widespread adoption of artificial intelligence on platforms and decentralized tools like IPFS will mean that the online hate landscape will change once again. Combating online extremism in the future may be less about “meme wars” and user-banning, or “de-platforming,” and could instead look like the attack-and-defend, cat-and-mouse technical one-upsmanship that has defined the cybersecurity industry since the 1980s.
No matter what technical challenges come up, one fact never changes: The world will always need more good, smart people working to counter hate than there are promoting it.
Looking for the latest gov tech news as it happens? Subscribe to GT newsletters.