Does Thinking Big Mean Big Brother?

It's becoming easier for public CIOs to create sophisticated new applications that merge and analyze data from different information silos. Unfortunately not everyone is happy with the results.

by / November 18, 2003
Twelve years ago, some public channeler had made a great stir because the government had an average ten hours videotaped and otherwise recorded information on every citizen with a set of government credit tokens and/or government identity card.

Eleven years ago, another public channeler had pointed out that ninety-nine point nine nine and several nines percent more of this information was, a) never reviewed by human eyes (it was taken, developed, and catalogued by machine), b) was of a perfectly innocuous nature, and c) could quite easily be released to the public without the least threat to government security.

Ten years ago, a statute was passed that any citizen had the right to demand a review of all government information on him or her. Some other public channeler had made a stir about getting the government simply to stop collecting such information; but such systems, once begun, insinuate themselves into the greater system in overdetermined ways: jobs depended on them, space had been set aside for them, research was going on over how to do them more efficiently -- such overdetermined systems, hard enough to revise, are even harder to abolish.

Eight years ago, someone whose name never got mentioned came up with the idea of ego-booster booths, to offer minor credit (and hopefully, slightly more major psychological) support to the Government Information Retention Program:

Put a two-franq token into the slot ... feed your government identity card into the slip and see, on a thirty-by-forty centimeter screen, three minutes' videotape of you, accompanied by three minutes of your recorded speech, selected at random from the government's own information files.

From Triton, a science fiction novel written by Samuel R. Delaney in 1976

Though the idea of "ego-booster booths" seemed hopelessly futuristic in 1976, flash forward to 2003. If you stopped 10 people at random on the street and asked them if they'd spend $2 or $3 to review snippets of information the U.S. government has collected on them, it's a fair bet you'd get at least eight yeses.

Even at the beginning of the 1990s, the thought of government effectively consolidating its information on citizens into one gigantic database to create exhaustive electronic dossiers on individuals seemed too far-fetched to consider.

With the emergence of federal data gathering and analysis programs -- Carnivore, Echelon, Total Information Awareness (TIA) and CAPPS II, to name a few -- it's clear the government is perfectly capable of amassing huge quantities of data on individuals and analyzing it. What seemed impossible is now happening, due largely to meteoric advances in software, and hardware capabilities that make sifting through and analyzing millions of data bits almost mundane.

Pandora's Box
Governments owning and maintaining massive volumes of data on their constituents isn't new. What's new is that ability to make connections between data sets traditionally salted away in operational silos -- a practice that has long prevented connecting data dots -- to make a portrait of an individual or business for purposes of safety, security, social service or economic improvement.

The connections government can make by merging different data sets could prove beneficial: better security at the borders and in airports, less fraud with drivers' licenses and more value from aggregated information. It's also becoming easier for CIOs to fashion these applications from large data sets stored on inexpensive computers, and sophisticated software programs that sift, sort and analyze at extremely fast speeds.

But this ability is making privacy advocates extremely nervous, and groups such as the Electronic Privacy Information Center (EPIC), the Center for Democracy and Technology (CDT) and the American Civil Liberties Union (ACLU) have been fighting to limit the reach of government agencies, mostly from law enforcement, into people's lives through sophisticated databases.

One of the most visible examples of this battle is the former TIA program. The brainchild of the Information Awareness Office (IAO) of the Defense Advanced Research Projects Agency (DARPA), TIA was conceived as a research project by then-director of the Information Awareness Office, John Poindexter, with the goal of building an information system that could predict and prevent terrorist attacks through "total information awareness." After negative comments in the press and a number of Capitol Hill hearings examining TIA, the program underwent cosmetic surgery and emerged with the much less ominous name of Terrorism Information Awareness program.

At the end of July, Poindexter resigned from the IAO over the uproar created by his next idea: a futures trading market in which predictions of assassinations, terrorism and other such dire events in the Middle East would be traded.

It's a classic case of "government getting told to think outside the box, and then getting told to get back in the box," said Paul W. Taylor, chief strategy officer for the Center for Digital Government, the research arm of e.Republic, parent company of Government Technology's Public CIO.

Homeland security pressures, as well as the more mundane demands to simplify electronic transactions at all government levels, have spurred some in government to try stepping outside their bureaucratic boxes -- but if they take too big a step over that thin line, they get in trouble.

Perhaps nowhere is that line more thin than between "good" information systems that collect acceptable amounts of personal information, and "bad" systems that collect too much. If an information system's goal is to efficiently amass and store as much data as possible, then TIA is the apogee of information systems' design -- yet in February, TIA was "bad" enough to stir Congress to suspend funding to DARPA for anything TIA related, and force the agency to issue a report detailing TIA's impact on privacy.

In July, the Computer Assisted Passenger Prescreening System (CAPPS II), a project launched by the Department of Homeland Security to evaluate security risk of passengers before they board a commercial flight, suffered a similar fate when the Senate Appropriations Committee cut project funding until the General Accounting Office issued a report on the privacy impacts of the system.

Designing Safe Data Systems
States are also fighting the same problem.

Earlier this year, the Driver License Division of the Texas Department of Public Safety (DPS) thought it had a good idea: Why not collect biometric data from Texans applying for drivers' licenses to guarantee a person's identity and make sure nobody could fraudulently obtain a driver's license? Given the role that illegally obtained drivers' licenses played in the 9/11 attacks, any state's plan to assure integrity of its drivers' licenses would seemingly be a can't miss proposition.

It missed in Texas.

The state's Senate approved legislation giving the Driver License Division the power to collect and store biometric information, but the House summarily dumped the bill by a wide margin: 111-26. The sticking point was that the legislation would have allowed the Driver License Division to share the biometric data with other state agencies, including law enforcement entities.

"The thought of the government amassing this kind of information on citizens is troubling, but my greatest concern was the fact that the information wasn't being used only for driver's license purposes, but would be shared with other agencies and other law enforcement agencies," said Rep. Bryan Hughes, R-Mineola, Texas. "We respect the need for the DPS to have accurate records and to maintain a database. However, the DPS database is for driver's license information. It's not to do criminal investigations against Texans."

Under current law, the Driver License Division collects only the thumbprints of Texans seeking drivers' licenses, and that information is not shared with any other agency, state or federal. The biometric data bill also contained a clause to allow the Driver License Division to share thumbprints with other state and federal agencies.

Hughes said legislators in Texas aren't willing to pass any bill that makes information sharing a possibility, and if the DPS comes back in the next legislative session with another bill to seek collection of biometric information, the agency needs to let legislators know such information will not be shared with other agencies.

"I would still oppose that bill, but I think the DPS would have a better chance of getting it passed," he said. "The concept of the government maintaining [biometric] information on its citizens troubles me, and I think it falls under the heading of unreasonable search and seizure in both the U.S. and Texas constitutions. Fundamentally the government doesn't have any business collecting that kind of information on people who are not suspects."

Debate on the House floor also brought up the discomforting nature of technology used to collect biometric information, he said. One member asked his colleagues if they really wanted to be the ones telling their constituents there was nothing wrong with having their faces scanned at Driver License Division offices, and that information taken from those scans would be stored in a government database.

Finding the Line
Some observers argue the only way to avoid stepping over the line is to build privacy and security measures into the structure of e-government projects from the start. In that way, information systems designed for e-government projects don't stray off target.

"If you do it afterward, it's too late," said Ari Schwartz, associate director of the Center for Democracy and Technology, a Washington, D.C.-based civil liberties advocacy organization. "First, you've ruined the reputation of the project, and second, it's just too hard to come back with patches to fix the problem."

The main problem is the secrecy of projects, such as Carnivore and Echelon, he said, because by nature, these data collection and mining initiatives don't seek out public input -- whether from motivated individual citizens or from advocacy groups -- at the beginning. Without that input, guidelines as to what amount of data collection is proper are defined from the perspective of those interested only in data collection, and to them, there's no such thing as too much data, according to Schwartz.

"If you can get a process for thinking of things you might need information for beforehand, and figuring out what you're going to do with that information once you have it, then you can come up with a good set of privacy rules that may -- even though you're sharing that information with more people -- actually end up protecting privacy more," he said.

The CDT is working with federal agencies to help draft reasonable guidelines, Schwartz said, citing ongoing discussions with the Transportation Security Administration (TSA) on the design of the CAPPS II system, and how the system collects and mines personal data.

"It's a lot of work," he said. "There's sympathy even from the privacy groups over all the barriers they [TSA] have to overcome. But if you're taking on a project that's looking through every individual's information to try to root out the bad guy, to try to find the needle in the haystack, you're pulling in a lot of information about individuals who are innocent, who expect their privacy to be protected, who are concerned about how their information is used. You're taking on trouble by the fact that you're doing it that way."

Schwartz also said the CDT is working with the Office of Management and Budget and the General Services Administration to develop privacy guidelines in the administration's e-government initiatives to ensure even mundane projects -- such as designing an information system for electronic campground reservations -- don't collect reams of personal data just because it's possible to do so.

Strange Bedfellows
Though clear opinions on the intrusiveness of some government information systems exist, simply characterizing the debate as cops versus civil liberties groups is a mistake, Schwartz said, because the law enforcement community knows trampling civil liberties in the name of fighting terrorism doesn't make people want to support those efforts. Any hint of a superdatabase or similar project will quickly generate alliances against it.

"Most other countries don't have the same kind of concerns about national ID cards or national databases that the United States does because of our historical experience," Schwartz said. "If people really wanted things like that to happen, they would have to happen very gradually, where protections were built into the systems. I can't imagine something happening overnight without stirring up every conservative group and every liberal group in the country -- and that is a combination that can't be beat."

Texas' Hughes said he wasn't too surprised that just as many liberal Democrats as conservative Republicans argued on the House floor against the biometric data legislation.

"It was one of those issues that cuts across lines," he said. "The Republican Liberty Caucus of Texas, a Libertarian-leaning Republican group, called and asked me to help oppose the bill, and after the bill was dead, I got a letter from the Texas Chapter of the ACLU thanking me for my opposition. It was a strange issue. I called up the president of the Texas ACLU and thanked him for it. I'm a conservative Republican, but there are some issues we actually agree on."

TIA Reborn?
In Florida, what critics call a second coming of TIA has been quietly flying under the public's radar, until recently. The multistate anti-terrorism information exchange (MATRIX) was designed to increase and strengthen the exchange among all government levels of sensitive information relating to terrorism and other criminal activity.

MATRIX has three major components, said Phil Ramer, special agent in charge of statewide intelligence for the Florida Department of Law Enforcement (FDLE), the agency that began testing MATRIX a year and a half ago. One is connecting all the states together on a secure network, the Regional Information Sharing System Network (RISSNET); the second is enabling states to take their respective confidential intelligence information and share it with each other in a Web-based environment; and the third is the data analysis component, in conjunction with a Florida-based company called Seisint Inc.

To date, MATRIX has received $4 million in funding from the Justice Department's Office of Justice Programs to work on database integration, develop hardware and software, and provide network support to a coalition of law enforcement agencies in participating states. Approximately $1.5 million of that money is earmarked for the data analysis aspect of MATRIX, Ramer said, and $2.5 million is set aside for connecting states to RISSNET and building a Web-based intelligence service. So far, 13 states belong to the MATRIX project.

In addition, Ramer said, the U.S. Department of Homeland Security is earmarking $8 million in funding for data analysis that MATRIX is carrying out with Seisint -- the firm that developed the data-mining algorithms and pattern-matching technology behind MATRIX. Seisint will use the money to upgrade its capacity to store and analyze massive amounts of data expected to come in from other states as they join MATRIX, Ramer said.

Part of MATRIX's recent spate of criticism stems from the system's ability to sift through millions of records in its database, finding patterns and links among people and events. The composition of those records is also making people extremely nervous -- the MATRIX's algorithms analyze information from law enforcement agencies along with commercially available collections of demographic data purchased by Seisint. Critics are also unhappy that the massive databases MATRIX uses are located in Seisint's headquarters.

"There's been so much made about a loss of privacy in all arenas, including government, because technology allows for that," Ramer said. "Actually the biggest loss of privacy is those databases maintained by private enterprise. Wal-Mart probably knows more about you than any government agency will, if you shop there and have a Wal-Mart card.

"It's one thing when you have disparate records, but when a computer brings all those records together at one time, people think there's something suspicious there," he said. "But we haven't tried to keep this a secret. It's not anything we did in a vacuum."

Public/Private Data
MATRIX's ability to combine and analyze data collected by the government with data collected by the private sector has clearly put MATRIX in a class of its own. Despite the perception that Seisint is privy to ultrasecret, restricted information, Ramer said Seisint purchased publicly available demographic information that's already floating around -- and ironically a lot of it is purchased from government agencies that sell data.

Other parts of Seisint's database come from information collected by manufacturers, Ramer said, citing warranty cards as an example, and the manufacturer tells the public the information will be shared with other companies.

"If you're willing to give that information to ABC corporation, knowing they say in their privacy policy they're going to sell it, why would you then be concerned that that same information is coupled with law enforcement records that are publicly available?" Ramer said. "I don't think most of the public is that concerned about it. We really haven't had private citizens, that I'm aware of, complaining about it or calling and asking questions about it."

Critics say this combination lets law enforcement get its hands on demographic data they previously couldn't access. The FDLE maintains that MATRIX contains only information law enforcement has always had legal access to, and MATRIX makes connections between data at blinding speeds, which is new, he said.

"Seisint has millions of dollars of supercomputers they use to compute these billions of pieces of data real quick," Ramer said. "No government can afford that. They also have billions of pieces of data they have already purchased, which is public data -- addresses, where your cars are registered, tax records, things like that.

"We've taken the government data that's controlled, like criminal history records and driver's license records, and housed that data in a separate, secure facility inside Seisint that only we have control of," he said. "Their computers can get to it, obviously, and when a query is done, it's done on Seisint's supercomputer that accesses that government data."

There's a simple reason FDLE records are stored at Seisint headquarters: It would be physically impossible to keep those records elsewhere because of the bandwidth Seisint's supercomputer needs to search all those records as quickly as it does, said Ramer.

Though Seisint collects a lot of data, the FDLE is not privy to every piece of information stored in the company's databases. Ramer said Seisint has established parameters on who can see what data, noting that because law enforcement agencies are restricted from seeing financial data from credit bureaus, Seisint does not allow law enforcement customers to see that type of data.

Guilt by Association
Concerns about MATRIX have spurred the media and privacy critics to turn their attention to other public-sector projects that involve data exchange between agencies and different levels of government.

In August, Washington, D.C., and four surrounding states -- Maryland, Pennsylvania, New York and Virginia -- announced their participation in a pilot to test criminal justice information sharing between state and local governments. It wasn't long before the media weighed in on the project, called Shared Homeland Information Exchange of Local Databases (SHIELD). The Washington Post stated that, "By making it nearly effortless to conduct unfettered searches, data sharing lends itself to constitutional infringements."

But what's going on in Washington, D.C., and surrounding states doesn't belong in the same category as MATRIX, according to Suzanne Peck, the district's chief technology officer.

"We have no information on our system that is not information that has been routinely exchanged among agencies historically via phone or fax," Peck said. "The only difference is that the information is now sent electronically. One of the criticisms of MATRIX I've seen published is they use not only public-safety data, but commercial data and DMV data, and that it spreads [the concept of] Big Brother."

Unlike MATRIX, data entering SHIELD from all the participating jurisdictions' criminal justice and court systems is not aggregated into one gigantic database that can be queried, Peck said. Participating jurisdictions decide how much and what types of data they want to share.

Avoiding security issues at both equipment and system level has been a high priority for SHIELD, Peck said, because the entities sharing information must be sure they're working in a secure environment to be comfortable sharing information. Content issues also top the list.

"We are not extending the reach of information we have over and above any information that was historically provided by slower means," she added. SHIELD aims to share local data nationally, according to Peck. It doesn't do this by creating a new, superdatabase of information that might raise privacy concerns, but by providing existing information in an electronic format that aids those who need it the most: local public safety, justice, courts and correctional institutions.

But as long as the public, politicians and the press view the sharing of electronic information across agencies, different levels of government and between the public and private sector with distrust, CIOs will continue being told to put their data sharing projects back in the box.
Shane Peterson Associate Editor