Could infectious disease surveillance systems that accurately track social media data inform early warning systems and outbreak response?
The recent Ebola outbreak unearthed an interesting phenomenon. A “mystery hemorrhagic fever” was identified by HealthMap — software that mines government websites, social networks and local news reports to map potential disease outbreaks — a full nine days before the World Health Organization declared the Ebola epidemic. This raised the question: What potential do the vast amounts of data shared through social media hold in identifying outbreaks and controlling disease?
Ming-Hsiang Tsou, a professor at San Diego State University and an author of a recent study titled The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets, believes algorithms that map social media posts and mobile phone data hold enormous potential for helping researchers track epidemics.
“Traditional methods of collecting patient data, reporting to health officials and compiling reports are costly and time consuming,” Tsou said. “In recent years, syndromic surveillance tools have expanded and researchers are able to exploit the vast amount of data available in real time on the Internet at minimal cost.”
Given the popularity of social media, infectious disease surveillance systems that use data-sharing technologies to accurately track social media data could potentially inform early warning systems and outbreak response, and facilitate communication between health-care providers and local, national and international health authorities.
Indicator-based methods that rely on the collection and analyses of data based on protocols tailored to each disease are the most common method of disease tracking today. But such methods can’t detect potential threats quickly. In addition, they are poorly equipped to detect new diseases. Given such facts, some health agencies have begun to consider new ways to monitor symptoms in order to speed detection.
Additionally people do not always visit a doctor when they feel sick, making data collected from doctors and hospitals less useful. Yet people who stay home sick are likely to use social media to discuss their illness or search websites like Google to investigate their symptoms.
Currently there are no official national programs for disease surveillance via social media, but several systems are being used as complementary sources of information.
For example, disease detection app Flu Near You helps predict outbreaks of the flu in real time. Users self-report symptoms in a weekly survey, which the app then analyzes and maps to show where pockets of influenza-like illness are located. Flu Near You is administered by HealthMap in partnership with the American Public Health Association and the Skoll Global Threats Fund. The effort is supported with private funds to demonstrate its utility for multiple sectors that work together on pandemic preparedness. The information on the site is available to public health officials, researchers, disaster planning organizations and anyone else who may find the information useful.
“There are real opportunities for using this data that is scattered across the Web in news, blogs, chat rooms and social media,” said John Brownstein, HealthMap co-founder and associate professor of pediatrics at Harvard Medical School. “We’re focused on collecting all that information using data scraping, machine learning and other processes and combining it into one platform that will enable clinicians, public health practitioners and consumers to see what’s happening.”
Brownstein said the volume of data that can be collected today is what predicates the value. “One individual on social media talking about their illness is not going to be that useful,” he said. “But in aggregate, that information can tell us really useful things about epidemics. It can even tell us about new things, like the Enterovirus epidemic that we recently experienced. So we are developing systems that are much more crowdsourcing in nature. We are trying to better engage the public, to put the ‘public’ back in public health. That provides us some really exciting opportunities to understand what’s happening on the ground level.”
Understanding the accuracy of such information is also important, said Tsou, whose recent study explored the interaction between cyberspace message activity (measured by keyword-specific tweets) and real-world occurrences of influenza and pertussis. Tweets were collected within a 17-mile radius of 11 U.S. cities chosen on the basis of population and the availability of disease data. Tweets were then aggregated by week and compared to weekly influenza-like illness and pertussis incidence. The correlation coefficients between tweets or subgroups of tweets and disease occurrence were then calculated and trends were presented graphically.
“The correlation between the weekly flu tweets versus the national flu data was almost 86 percent,” said Tsou. “It was a very high correlation. Even more interesting is that when we compared our data to data from the San Diego County Health and Human Services Agency, who we partner with, we received even more precise data on weekly flu cases reported through their lab testing. The correlation was 93 percent — even higher than the national level. That was a very encouraging finding.”
But utilizing social media data in this manner also presents challenges, such as correlating a social media post with a specific disease or condition.
“A lot of people tweet that they have a fever or have the flu, but sometimes that information isn’t specific enough for us to connect it with a disease like whooping cough,” Tsou said. “That’s one of the limitations we are dealing with.”
“There’s both a blessing and a curse to using social media in that it’s super rapid, but it also generates huge amounts of noise,” Brownstein said. “Dealing with all the noise and trying to pick out the signals that have meaning is definitely a challenge.”
Some public health agencies are already beginning to rely on social media data to investigate health issues.
For example, last year the Chicago Department of Public Health began using Twitter to identify cases of foodborne outbreaks. The department teamed up with a group called Smart Chicago to develop an app that analyzes tweets that reference food poisoning, leading the city to step up inspections and enforcement on offending establishments.
The New York City Department of Health and Mental Hygiene is taking a similar approach. It recently worked with Columbia University and Yelp on a pilot to prospectively identify restaurant reviews on Yelp that referred to foodborne illness.
“These systems are operational, and they are being used by government entities to provide situational awareness,” Brownstein said. “They’re not necessarily the only sources of information, but they are an important source of information.”
But it may still be a while before public health departments officially adopt social media data as a significant element of their regular investigations.
“Public health officials tend to be very conservative,” Tsou said. “They want to make sure social media can really demonstrate a value for predicted disease outbreak. There is still a long way to go in terms of communication and education. But I think there is great promise and potential for using social media as a public health tool.”
“The use of social media for public health surveillance and disease detection is an evolving work nationwide,” said Jeffrey Johnson, a senior epidemiologist for the San Diego County Health and Human Services Agency. “Most of the work is still within the realm of research and academics, some of whom are validating their work with real events detected through different systems and reporting channels.”
Johnson added that while San Diego County Public Health Services does use social media quite a bit as a media and communication tool, the county is not currently using social media for surveillance and disease case finding.
The Milbank Quarterly recently published a study on the challenges facing practitioners as they consider ways to integrate social media and Internet data into the detection and management of disease outbreaks. Researchers involved in the Social Media and Internet-Based Data in Global Systems for Public Health Surveillance study found some of the limitations of event-based surveillance: Information isn’t always moderated by professionals or interpreted for relevance before it’s disseminated to epidemiologists; there’s no standardized system for updates; algorithms and statistical baselines aren’t well developed; and new information about health events isn’t disseminated efficiently.
On the positive side, because it occurs in real time, event-based surveillance can identify events faster than indicator-based surveillance. Ultimately the authors concluded that event-based surveillance could improve surveillance activities, but not without systematic evaluation within a public health agency.
Brownstein agreed. “There needs to be a way for representing that data in a way that’s useful for decision-makers,” he said.
Yet the combination of indicator-based and event-based surveillance has potential for improved overall “epidemic intelligence” that could help monitor outbreaks and disease risk. And it may have other benefits.
“Even more important is the situational awareness that can be derived from the mining of social media data,” said Brownstein. “What are the impacts of outbreak events at the societal level? We can pick up these kinds of things through these channels. There’s value in understanding the public perception and communication and how government can refine its communications based on the response of the population. Using social media to understand people’s attitudes and beliefs in that way is extraordinarily powerful.”