GIS On the Trail of Disease

Spatial analysis of disease started with the London cholera epidemic of 1854. Today, GIS is the newest tool in tracking epidemics.

by / July 31, 1996
GIS, with its
abilities for complex
mapping and
spatial analysis,
is the newest tool
in tracking down
the source of
new diseases.

The task of the epidemiologist is to determine the causative and associative factors of a disease or phenomena, identify elements affecting its rate of incidence and establish a means of control. A tool that is becoming more widely used in accomplishing this is the geographic information system, with its capabilities for complex mapping and spatial analysis. GIS enables epidemiologists to analyze associations between environment, location and disease; map the geographic spread or containment; pose hypothetical questions; and identify risk factors and spatial patterns that might otherwise go unnoticed.

The first known application of mapping in epidemiology was the classic study by English physician, John Snow, who discovered that cholera is transmitted by contaminated water. During the London epidemic of 1854, Snow mapped the locations of cholera deaths and saw that nearly all were in the vicinity of Broad Street, in the Golden Square district of the city. Further investigation revealed that nearly all of the victims had drunk water from the community pump on Broad Street. At Snow's insistence, the handle of the pump was removed, and the epidemic that took nearly 600 lives in five days ended.

The fight against emerging infectious diseases today is often global and exponentially more complicated. Expanding populations, communities displaced by wars and shifting economies, jet travel, the shipment of products across oceans and continents -- all contribute to the spread of infectious diseases and the dissemination of new strains.

Some of the more noticeable are AIDS, the resurgence of measles (about 38 percent imported from Europe, 39 percent from Mexico); TB in the form of multiple, antibiotic-resistant strains, and more recently the appearance of a strain of Ebola virus (Ebola Reston) in Texas that kills only monkeys ... for now. As more lethal forms appear, and new, resistant strains emerge, the ability to quickly and accurately analyze and predict their spread over space and time becomes a critical factor in public health protection. The emerging role of GIS in epidemiology is helping to accomplish that.

Equally important in the fight against infectious disease is the sharing of information, methods and resources among public health agencies, universities and companies working with GIS. Such efforts often accelerate solutions, produce new approaches and insights, and lower operating costs. In Southern California, for example, the San Bernardino County Department of Public Health and Loma Linda University's School of Public Health joined in a cooperative effort with ESRI to enable an intern to conduct a GIS analysis of the county's 1989-91 measles epidemic. The study produced new insights, and a database that will assist the department in controlling future outbreaks.

As an intern in research epidemiology, William Hoffman was particularly interested in the contributing factors of the epidemic, which coincided with the sudden resurgence of measles nationwide. From 1989-91, the number of cases in San Bernardino County jumped to over 2,000 (4 percent of all cases in the U.S.), up from a total of 28 cases for the period 1981-85; a seemingly disproportionate representation for a county of desert and mountains nearly three times the size of New Jersey, but with a population of only 1.4 million. However, 95 percent of the cases occurred in the cities of Barstow, Fontana and San Bernardino, where most of the population lives.

According to Hoffman, the objective of the study was to demonstrate the utility of GIS as a planning tool for the prevention and control of future measles epidemics in San Bernardino County. Since the department did not have GIS until 1995, it had conducted an investigation of the epidemic using conventional, descriptive statistics. Director of Public Health, Dr. Thomas Prendergast, however, was aware of the potential of GIS and welcomed the study. "We had some interest in knowing whether or not we could see any patterns of occurrence, if we could gain any insights that we did not otherwise have by looking at the geographic distribution and sequence of cases."

"It was an opportunity to see what GIS could do," reflected Sarah Mack, department immunization program director, "to actually see the cases from a perspective other than that provided by the bar charts we had created using standard, descriptive statistics. It was certainly the first time we ever had any of our data put into a GIS system."

Interning simultaneously with the department and ESRI not only gave Hoffman access to county data but opportunities to work with health care professionals and engineers designing the latest software technology. According to Public Health Program Manager Colleen Tracy, the county provided selected, provisional 1992 resident birth data; communicable disease and demographic data on measles cases reported in the county between 1989-1991; immunization clinic locations; Child Health and Disability Program provider locations; and the Planning Department's street network file.

Hoffman pointed out that demographic data from the 1990 Census was provided by Equifax National Decision Systems, a data provider that allows ESRI to use data in prototype applications such as this.

At ESRI, Hoffman was given almost free rein to explore a wide range of software, including the latest versions of ArcInfo and ArcView, often before they were marketed. "I had access to top-of-the-line output devices and was surrounded by experts who were just a few questions away from any problem I had. It would have cost hundreds of thousands of dollars to equal those resources out in the field."

Outlining the six-month project, Hoffman said, "we took all 2,044 cases from the measles data and address-matched those to the street network file. That enabled us to show, point-by-point, where the persons lived. We overlaid the 1990 census data by census tract and block group onto the addresses of the cases, then merged the two files together. From that we were able to create a shape file, or a map of measles cases by point. That gave us a count of the number of cases per census tract. We did the same thing for births, which told us where the new susceptibles lived; we wanted to take them into account in our predictions.

"We identified the high-risk areas and the centroid of each block group, then determined the allocation of resources [mobile immunization clinics], based on where they would have been most effective; that is, on the distance a woman might walk with small children to get to a clinic. From earlier observations, we estimated this to be about 10 minutes each way."

ESRI software engineer Witold Fraczek assisted Hoffman by conducting a multivariate analysis to identify the populations most at risk. Fraczek used GRID, an ArcView spatial analysis tool, and data from the 1990 Census to determine which independent variables (e.g., age, ethnicity, income, birth rate, etc.) most accurately predicted the dependent variable (measles). The results were incorporated into the new 3.0 version of ArcView.

By combining ArcView with Avenue (an object-oriented programming language), Hoffman created a customized application that enabled public health personnel to query data developed by the project. "The general descriptive epidemiology -- who, where, when, etc., -- was conducted using Epi Info, a statistical analysis program designed for epidemiology by the CDC [Centers for Disease Control]. Epi Info is available as freeware on the Internet."

In the final steps, Hoffman and Fraczek created a model using linear regression analysis to determine the characteristics most likely responsible for the epidemic, and to predict where future cases would occur. Measles cases that had occurred since 1992 were plotted out, address matched and placed over the base map. Hoffman noted that all of the subsequent cases were located in the same areas that had high infection rates during the 1989-1991 outbreak. "Most of the cases occurred where we predicted; none occurred in low-risk areas. The characteristics that best predicted the number of measles cases were race and ethnicity (a positive association for Hispanics living within a 1990 Census tract); income level (positive for lower, negative for higher); age (positive for younger, negative for some middle-age groups); marital status (positive for divorced men and women and for single-parent families); and births (positive for the number of births)."

The project concluded with a presentation to the Department of Public Health, along with delivery of the ArcView application, maps and charts with descriptive epidemiology, and the software tools needed to conduct further analysis.

Although much of the information provided by the study was already known to the department, Prendergast said the more sophisticated methods of GIS analysis revealed some interesting insights into the county's populations, particularly where the risks of measles were high. "It showed that in the white population of the county, where there are small areas characterized by low economic status and high-density housing, there was a higher measles rate than occurred in other places. That's a relatively small insight, but a significant one for us to have gained by the analysis, and one we would not otherwise have easily known. The study was well worth the effort. It produced incredible visual aids to show how an epidemic affects places."

The study also had economic benefits. As Tracy pointed out, the database provided by the research will be used in conjunction with the department's automated immunization registry, which currently maintains records of 68,000 children and supports reminder activities for about 198,000. Merging current immunization records with the research data will enable the department to plan vaccine coverage based on population and birth data. "The fact that those records are automated with address detail," Tracy said, "provides a powerful tool for current policy and planning, and up-to-date analysis in the event of an outbreak." Tracy added that it was the cooperative efforts of the department, Loma Linda School of Public Health and ESRI that enabled them to have the database for this application at modest cost.

As the planet continues to shrink in size while increasing in population -- doubling to more than 11 billion in the next 45 years -- GIS technology, and the cooperative efforts between education, government and the private sector, will continue to play an important role in the battle against infectious diseases.

ArcInfo 7.0

ArcView 2.1

ArcView 3.0


Epi Info (CDC, Atlanta)