IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Mining Social Media Data for Policing, the Ethical Way

Law enforcement is already using social media to watch, assess and sometimes arrest citizens. But they haven't necessarily considered all the ethical implications of that approach.

This story was originally published by Data-Smart City Solutions.

Social media posts are full of data that, when made accessible to governments, can make interventions quicker, more effective, and more representative. From pictures of emergency conditions to posts about crimes in progress, users constantly inundate Facebook, Instagram, Twitter, and other platforms with posts that may contain rich and timely information about events relevant to public safety. Using social media mining that leverages advances in natural language processing and machine learning to pull useful data from text and images, cities can transform these social posts into data points ripe for analysis.

Policing is one area that seems an obvious fit for social mining initiatives. Social monitoring has great potential to make policing more proactive, targeting incidents before they escalate into tragedies. By deploying software that can sift through mountains of posts and identify relevant keywords, governments can track posts indicative of danger or criminal activity and intervene.

These social monitoring efforts have become increasingly commonplace in police departments. A report by the Brennan Center for Justice at the NYU School of Law showed that nearly all large cities, and many smaller ones, have made significant investments in social media monitoring tools. A 2016 survey by the International Association of Chiefs of Police and Urban Institute revealed that 76 percent of officers use social media to gain tips on crime, 72 percent to monitor public sentiment, and 70 percent for intelligence gathering.

As one example, the city of Huntington Beach, Calif., used social media monitoring to inform its policing efforts during the U.S. Open of Surfing in 2015, an event that often leads to increased crime along with the large crowds it attracts. Using tools from GeoFeedia, a company that offers location-based analytics platforms, the Huntington Beach Police Department was able to monitor social media activity near parking garages in the area — places where teens often meet to drink or use drugs, occasionally leading to altercations over drug deals or other more serious crimes. The software monitored keywords like “gun,” “fight” and “shoot” to identify potential crimes and the city then sent patrol officers to investigate incidents.

While this intervention was relatively successful, shortly thereafter, police departments in Baltimore, Chicago, Fresno, Calif., and other cities received significant pushback for their social mining efforts. Reminiscent of the recent Cambridge Analytica controversy, these law enforcement agencies partnered with third parties like GeoFeedia that gained access to back-end data streams via APIs. Internal police records from Working Narratives revealed that one of GeoFeedia’s stated goals was to bypass privacy options offered by sites like Facebook. Platforms like GeoFeedia and a similar tool called Snaptrends tied in dummy accounts — fake profiles that often use provocative pictures of women to attract suspects as friends or followers — to track users’ location across social media sites, regardless of whether or not they publicly geo-tag their posts.

And the American Civil Liberties Union (ACLU) of California reported that police departments were targeting racially loaded phrases during the protests that followed the Michael Brown and Freddie Gray killings, monitoring hashtags like #BlackLivesMatter and #DontShoot. Following the revelation of these practices, Facebook, Twitter and Instagram revoked access to back-end data for major social mining companies GeoFeedia, SnapTrends, and Media Sonar. In 2017, Facebook and Instagram banned all users from leveraging back-end data for surveillance.

Evidence of bias in social mining has continued to arise. This February, the ACLU of Massachusetts uncovered evidence of prejudice in social media surveillance efforts by the Boston Police Department (BPD). Between 2014 and 2016, the BPD had tracked keywords on Facebook and Twitter in an effort to identify terrorist threats. Looking for “Islamist extremist terminology,” the BPD targeted keywords including “ISIS” and “Islamic State,” but also phrases like #MuslimLivesMatter” and “ummah,” the Arabic word for community.

Yet even without accessing back-end data, police departments can mine readily available information from user news feeds. Even public feeds are full of data that can make policing more effective, from posts about crimes in progress to damning evidence offered freely by criminals and even live videos of crimes.

Police departments need a better, more ethical way of mining social media. Drawing on the legal expertise of three attorneys and law professors — Mason Kortz of the Berkman Klein Center for Internet & Society at Harvard; Fred Cate, Vice President of Research at Indiana University; and Wendy Seltzer, Strategy Lead and Policy Counsel at the World Wide Web Consortium — this article examines ethical objections to social mining in the context of policing. Far from a rejection of social mining, the article seeks to raise questions and offer recommendations for applying these tools to public safety in a way that respects civil rights and prioritizes resident benefits. When pursuing social mining, cities must think more carefully about privacy, free speech and profiling.

Collecting Valuable Information While Maintaining Privacy

Social media mining offers the potential to track residents’ moves as they document their activities at tagged locations throughout the day. However, governments must walk a fine line between collecting useful information and undermining resident privacy.

According to Kortz, when implementing such initiatives, governments must be cautious not to violate “reasonable expectation of privacy.” Reasonable expectation of privacy is a constitutional term derived from the Fourth Amendment that protects citizens from warrantless searches of places in which they have a subjective expectation of privacy that is deemed reasonable in public norms.

With respect to social media mining, Kortz emphasized that while individual social media posts might be public, monitoring an individual’s posts over a period of time may still violate privacy. Kortz cited U.S. v. Jones (2012), in which a concurring opinion from Justice Sotomayor maintained that “there are certain expectations about how much effort someone will make to track you,” in Kortz’s words.

In Jones, the police installed a GPS tracker on the car of Antoine Jones without a warrant and used information from the tracker to ultimately charge Jones with drug possession. Citing an idea often called the mosaic theory, Sotomayor argued that although the location of someone’s car is clearly public information available to anyone watching the road, information on the location of a car throughout the period of a month — a mosaic of those individual moments — violates reasonable expectation of privacy. For, while people might expect that others are watching as they take a right turn at an intersection, they certainly do not expect that others are monitoring their car day and night for a month.

Similarly, according to Kortz, “In the case of social media, most people expect that only a couple of their friends view their Facebook profiles, but considering the potential for social media mining, there’s a gap between what people expect and what is possible … Expectations define privacy, but they often lag behind reality.”

Cate, on the other hand, has a different understanding of the privacy protections around social media mining.

“I’m personally sympathetic to the mosaic theory, but the court has not been,” he explained.

Indeed, a number of other Supreme court cases do seem to contest the idea that a mosaic of public data can violate reasonable expectation of privacy. For example, in U.S. v. Garcia (2007), the court ruled that attaching a GPS tracker to the car of someone suspected of manufacturing methamphetamine did not violate Fourth Amendment privacy protections.

However, in Garcia, the court left open the question of privacy protections for broader surveillance efforts. The opinion states, “One can imagine the police affixing GPS tracking devices to thousands of cars at random, recovering the devices, and using digital search techniques to identify suspicious driving patterns…It would be premature to rule that such a program of mass surveillance could not possibly raise a question under the Fourth Amendment.”

The court chose to leave open the matter of whether monitoring residents en masse without any suspicion of a crime committed would violate reasonable expectation of privacy.

The opinion continues, “Should government someday decide to institute programs of mass surveillance of vehicular movements, it will be time enough to decide whether the Fourth Amendment should be interpreted to treat such surveillance as a search.”

With the advent of social media mining, it would seem that the moment of mass surveillance has arrived, and yet the court has yet to provide a definitive answer.

The issue becomes more complex still when you consider the blurred lines between public and private information on social media. While Garcia established that necessarily public information — like the location of a car — can be monitored over time in the presence of reasonable suspicion, what about information intended by Facebook users only for friends that GeoFeedia and Snaptrends have made it their goal to access? Even with reasonable suspicion, would gathering such information be acceptable under the Fourth Amendment? Case law has established that information volunteered to a third party loses its privacy protections, but scholars have argued that Facebook posts do not fall under this category.

Again, these questions about privacy protections around social media mining seem to lack any definitive legal answer. Cate, however, offered that this may not be particularly important. According to him, legal constraints are not the most compelling limitation when it comes to privacy.

“When thinking about risk, you can't just think about legal risk. The greater issue is public perception,” he said.

For Cate, the strongest protections against privacy violations are social and political rather than legal.

“It’s rare that a data collection effort is a violation of a privacy law,” he said.

Governments should therefore structure their data collection efforts around the desires and needs of residents.

In designing social mining efforts, “transparency is a useful and necessary tool,” said Cate. Cities should readily reveal what data they are collecting, how and what they are using it for, and create channels for public feedback. Cate even proposed that governments create an institutional review board to assess their data collection efforts.

“A department says, ‘Here’s what we want to do with this algorithm’ and then you have a group — either internal or including community members — examine the proposal,” Cate explained.

This way, cities could ensure that residents understand and have provided input on social mining programs before deploying them.

In fact, Cate's proposition that governments should ensure residents are comfortable with data mining initiatives is inherently linked with Kortz's objections on the grounds of reasonable expectation of privacy. For ensuring transparency as well as public participation and education around data mining initiatives is a way of bringing expectations in line with the technological capacities possessed by governments.

"Part of the necessary process must be public education," said Kortz. "But, in the meantime, it’s not appropriate for organizations to be taking advantage of information asymmetries."

If paired with robust community engagement and public education, it is certainly possible for social media mining efforts to pass both the legal and political tests, as the two hinge on similar questions of resident knowledge of and comfort with government activities.

Respecting Free Speech

If you were someone prone to use colorful language on social media, would government efforts to target potential criminals based on social posts push you to change the way you speak?

According to Kortz, this is a critical question for governments with social monitoring strategies.

“Monitoring social media for posts indicating criminal activity doesn’t restrict speech, but it certainly has a chilling effect,” said Kortz. “It’s to claim, ‘You can say that, but we’re going to be listening.’”

The chilling effect describes the result of an indirect discouragement of free speech. If severe enough, a chill may constitute a violation of First Amendment protections and therefore be unconstitutional.

According to Gayle Horn — partner at Loevy & Loevy and author of “Online Searches and Offline Challenges: The Chilling Effect, Anonymity and the New FBI Guidelines” — whether or not a government action qualifies as an unconstitutional chill has historically hinged on three factors:

  1. The nature of the government activities,
  2. The legitimacy of governmental interest involved and
  3. Whether the government activities are narrowly tailored to achieve these goals.
Whether the nature of government activities is unconstitutional itself revolves around a few sub-questions. The first is the type of information gathered by government surveillance; gathering public information — like public social media posts — is traditionally permissible. The second is the scope of the intelligence gathering. While there are no definite standards for what scope of surveillance is acceptable, in Handschu v. Special Services Division (1971), the court ruled that the attempts of the police to deceive protestors into criminal activity crossed the line as a clear unconstitutional chill.

On the other hand, Laird v. Tatum (1972) established that “Allegations of a subjective ‘chill’ are not an adequate substitute for a claim of specific present objective harm or a threat of specific future harm.”

The final question about scope is whether or not the government has developed clear guidelines for its actions; if the government has institutionalized a process and limitations, the action is more likely to be acceptable. When implementing social media mining initiatives then, governments should focus on targeting publicly available posts and institute clear guidelines to ensure their surveillance stays within legal limitations.

If government actions meet these requirements, their potential chilling effect must then be weighed against the government’s goal in introducing the policy.

“A policy might have a chilling effect on legitimate speech, but how does this balance out against the legitimate interest of the government to, for instance, protect against violence?” asked Wendy Seltzer.

In the most recent iteration of Handschu in 2003, Judge Charles Haight ruled that there must be “information…which indicates the possibility of criminal activity” in order for the police to surveil specific groups.

Some might read this as an indictment of general social media mining initiatives, requiring suspicion of an individual in order to monitor his or her social media. Yet Seltzer does not read it in this way, but rather as evidence that governments must be able to make a strong case that mining activities support a compelling interest.

“If a government was trying to defend a policing action against accusations of the chilling effect, they’d have to bring evidence showing they’d been able to police more efficiently,” said Seltzer.

The other question is whether or not this action is “narrowly tailored” to achieving the goal. This constitutional standard requires that the government action does not result in outcomes irrelevant to the stated goal and that it does accomplish the essential aspects of the goal. Seltzer said that social media mining does raise questions of over- and under-inclusivity — meaning that it may both affect outcomes outside the stated goal of public safety and fail to accomplish critical aspects of the stated goal. For, mining social media for words like “gun” or “kill” will certainly target citizens who pose no threat to public safety and will also fail to identify some residents who are criminals. It is important then that cities pursue the most accurate, rigorously tested algorithms possible in order to avoid false positives while identifying criminals. 

In the end however, Seltzer admits, “we don’t know how the courts would come out.” And yet, Seltzer echoed Cate’s sentiments that the legal constraints may not be the most important consideration.

“We have to remember why we put so much weight on free speech,” she said. “Beyond questions about what the First Amendment prohibits, there’s much more room to talk about what is good governance and good policy. It’s not only about steering clear of what’s explicitly prohibited, but also about encouraging citizen engagement, a climate in which citizens can organize, all the other things that make communities successful.”

Seltzer stresses that governments’ most important priority should be creating an environment that fosters the free exchange of ideas. First Amendment guidelines provide some direction on how to do so, and yet where the constitution does not offer clear guidance, cities should defer to these priorities and the input of residents on how to achieve them.

Predicting Without Prejudice

Following the revelation by the ACLU of California that the Fresno Police Department had targeted terms like “justiceformike” and “police brutality,” social media mining came to be associated with racial profiling. The recent revelation of the Boston Police Department’s targeting of terms used by Muslim residents only strengthened this perceived connection. In light of these events, it’s clear that cities need to pay closer attention to the dangerous potential for profiling, and ensure that social mining efforts respect the civil rights of all citizens.

Kortz said, “With regards to activities like targeting #blacklivesmatter and other racial signals, we have to remember that any mining is set up by humans… There’s a term in the software engineering world — garbage in, garbage out — meaning that if you create algorithms based on bad data sets, you’re going to get bad results … It’s equally true to say ‘racism in, racism out,’ meaning if you use racist predictors — like tweeting #blm — you’re going to get racist results.”

According to Kortz, using social listening to target residents based on political or racial indicators violates the Equal Protection Clause, a constitutional guarantee of equal protection under the law. For Kortz, targeting terms like #blacklivesmatter is a paradigmatic example of racial profiling — identifying people as suspects based on their race.

Cate, while in agreement that police should not profile based on traits like race or political leaning, maintains that social media mining could allow police to profile in a more productive, non-prejudicial way.

“When you’re a government, you have to prioritize scarce resources, and it’s much better to do profiling based on data than the instincts of officers,” he said.

For Cate, targeting indicators that have shown to correlate with likelihood of committing a crime — including certain phrases on social media — is an effective way of distributing police resources in an informed way.

According to Cate, the question is whether or not those factors targeted can actually predict crime. This ties back to Cate’s endorsement of institutional review boards, which “would look at whether or not the department is using bad data, using it inappropriately, or using it with a bad algorithm,” he explained.

“The important questions are does it work? Do the correlations make sense? Have they been tested and demonstrated on a large scale? Was there any independence in the testing?” he said.

Avoiding prejudicial social mining efforts requires making decisions based on established trends. Targeting words like #MuslimLivesMatter on the whims of an officer is neither an appropriate nor effective practice. In the Boston case, there was no evidence that these tactics helped thwart terrorist activity. On the other hand, identifying language that has historically correlated with crime and deploying officers to areas where it originated is necessary for prioritizing scarce resources.  

A policymaker might ask what happens if it turns out that using terms like #blacklivesmatter does indeed correlate with likelihood of committing a crime. Is it then okay to prioritize areas based on these kinds of racial signals? This is a complicated question, and one that raises controversies about perpetuating patterns racial bias in the American criminal justice system, economy, housing, and a number of other areas.

In some cases, these outcomes may be a result of historically racist police tactics.

As Harvard Law and Computer Science Professor Jonathon Zittrain raised in a recent talk, “If it’s not outlandish to think arrest rates are affected by things they really shouldn’t be — by demographics or innate personal characteristics — the algorithm would happily predict what a police officer would do rather than what the accused would do.”

Research Director of the Harvard Access to Justice Lab Christopher Griffin agreed, explaining challenges in a model his team is working on: “An outcome is measured not as a charge, certainly not as a conviction, but as an arrest. That could very much be not related at all to the underlying criminal behavior, but to the practices of law enforcement.”

Some vendors, like algorithmic policing company Azavea, have sought to mitigate bias by deemphasizing some arrest data, particularly concerning racially loaded drug and nuisance crimes.

Yet a trickier issue is how to manage situations where some racial or ethnic group is indeed more likely to commit a crime, not just more likely to be targeted by police. In many cases, these trends are the result of a long and complex history of prejudicial treatment in many walks of American life. Do you target these residents, put them in jail, and perpetuate systems that have made it so that these groups are more likely to commit these crimes? This does not seem like a satisfying response, but a substantive analysis of these issues is outside the scope of this article. However, what this article does offer is that identifying threats of crime using good data rather than police intuitions can in fact reduce bias in policing, that keeping considerations of bias in mind while developing such initiatives will ensure continual improvement, and that cities need to have conversations about perpetuating bias before deploying these kinds of initiatives.

Making Algorithms Publicly Accessible

The potential for prejudice in social listening initiatives points to a broader problem common to many analytics initiatives: The algorithms employed are often not publicly available and are thus not subject to challenge from residents. Usually, cities employ algorithms developed by private contractors who are unwilling to release detailed information for fear of revealing proprietary information to competitors, and cities are hesitant to release source code because of concerns over cybersecurity and opportunities to game public systems. Citizens are therefore unable to assess the impartiality of the tools used to target people as potential criminals.

A lack of access to algorithms is a problem not only for ensuring equity, but also for confirming that the information gathered is accurate. The experience of Fresno City Councilmember Clinton Olivier shows that social media mining can be a misleading source of information. Representatives from Beware — a company that produces a social listening software that assigns residents and properties threat levels of green, yellow, or red based on their social posts — presented their product to the Fresno City Council. During the course of their presentation, Council Member Clint Olivier asked the company reps to look up his threat level. His property showed up yellow.

In this case, Olivier’s property appeared as risky because Beware analyzes addresses based on seven-year periods, and former occupants may have had a criminal history. However, social mining may also miscategorize residents based on hyperbolic or sarcastic posts or even by misidentifying them completely. Similar algorithmic tools — like one that predicts recidivism risk based on resident data — have proven no more accurate than non-expert assessments.

One could imagine these cases of mistaken identity leading to unnecessary escalation. As Olivier colorfully explained, “even though it’s not me that’s the yellow guy, your officers are going to treat whoever comes out of that house in his boxer shorts as the yellow guy.”

Thinking that Olivier was potentially dangerous, officers could use excessive precaution and force, creating an unnecessarily tense situation.

According to Kortz, in order to ensure due process, governments need to make their algorithms available to residents in one way or another.

“To be able to challenge algorithms, they need to be auditable in some sense,” he explained. “While companies shouldn’t have to reveal their algorithms in their entirety, they should have to log certain steps. It’s best to require private companies to output an auditable trail.”

Cate echoed these sentiments, arguing “There needs to be redress. If you’re going to act on data, you need a way to let residents challenge your process.”

And, like Kortz, he does not think that revealing algorithms in their entirety is the answer.

“Revealing algorithms tells the public close to nothing,” he said.

Rather, he calls for governments to publish information on the quality of the underlying data and the effectiveness of the predictions based on independent tests. Like Kortz, he calls for a balance between the privacy of proprietary algorithms and the right to transparency.

New York City has recently started the process of institutionalizing algorithmic transparency, convening a task force to develop recommendations to the mayor on how agencies can reveal information on algorithmic tools without sacrificing proprietary secrets or cybersecurity. Last month, Stephen Goldsmith and I wrote an article for CityLab detailing transparency requirements the city should consider. Three relevant to social mining initatives are:  

  • Sharing the motivation for using an algorithm in order to provide residents a benchmark by which to evaluate results and allow them to assess intentions.
  • Explaining what data went into the model and why to allow the public to identify potential bias from data tainted by historically discriminatory practices.   
  • Publishing performance data so citizens can assess the effectiveness of practices.
By opening social mining efforts to public redress, cities allow residents to understand, critique, and in some cases even improve these initiatives.

In Conclusion

As is the case with many innovations in public safety, governments need to balance the potential value of social media mining to prevent serious crimes with the civil rights of individual residents. And according to both Kortz and Cate, it is not only a question of whether and how governments collect social data, but also what they do once they have it.

“Municipal data has a lot of potential, but it’s about how cities intervene once they have data,” said Kortz.

“The way people usually think about privacy is about use and not collection,” Cate agreed.  

As an example of what to do with potentially incriminating data, Kortz pointed to Johnson County, Kan. The county partnered with researchers from the University of Chicago to develop an early intervention system for individuals who cycle through the criminal justice, mental health, social services and emergency services systems. The researchers generated approximately 250 features and developed a machine-learning model that output risk scores of people at risk of re-entering jail in the near future.

However, instead of sending this data to the police so they can keep their eyes on certain residents, the county will send the list of people with high risk scores to the mental health center’s emergency services so these individuals can be connected to care and decrease the likelihood of future police interactions.

It’s these types of welfare-focused interventions that demonstrate the real value of social media mining. When collected with a priority on civil rights and employed to the benefit of residents, social media mining can be a tool for improving residents’ lives and the safety of communities.