Site Seeing

Four states and Google make sure search engines can find government data.

by / September 23, 2007
Site Seeing

Government Web sites offer vast stores of public information, but many such sites remain effectively invisible to popular search engines.

As of 2006, 98 percent of state and federal Web sites provided access to publications, and 82 percent allowed public use of government databases, according to a report on state and federal e-government published by Brown University's Taubman Center for Public Policy.

Many local governments have followed suit, providing crime statistics, property tax listings, code enforcement data and a great deal more. But how many citizens find their way to this wealth of government content?

Not enough, say Google officials. And for a search engine provider, that's bad news. "When users search on Google for information related to health, employment, education - the topic areas that are critical for well-being - if they don't find what they're looking for, and quite often that information would best come from government, they are disappointed with Google, not with their government," said J.L. Needham, manager of public sector content partnerships for the Mountain View, Calif.-based search engine provider.

For Google, solving that problem means boosting user happiness, which makes good business sense, Needham said.

Google helped four states - Arizona, California, Utah and Virginia - roll out SiteMap, a protocol designed to make government content more visible on Google and other popular search engine sites.

Though some have voiced consumer privacy concerns because government records may contain personal or confidential information, California CIO Clark Kelso directed all agencies to redact Social Security numbers and other sensitive information from online documents, according to the San Francisco Chronicle.

Visit any government Web site, and you can use a menu system, a search box or a database lookup tool to find information of interest. For example, the Virginia Department of Health Professions offers a database of physicians. Visitors use it to research doctors' credentials, find out if complaints have been filed against them and learn about malpractice claims they've paid.

That's fine if you're aware that the state database exists. But until recently, if you didn't and used Google to research a doctor in that database, you wouldn't have found the information, said Aneesh Chopra, Virginia's secretary of technology. "That database is not crawlable by [Google's] crawler."

This is a problem because most people use Google and other search engines to seek the sort of information governments provide, according to market research firms that track user behavior on the Web.

"On the federal or state level, 60 [percent], 70 [percent] or 80 percent of users, depending on the agency, access those Web services through search engines," Needham said.

Search engines can locate only material that's identified by a uniform resource locator (URL), or the Web address you type into a browser to reach a page. Databases, in particular, often lack URLs.


Learning to Crawl
Several search engine providers, including Google, Yahoo and Microsoft, have adopted SiteMap, a protocol webmasters use to list URLs and other information about site content. That makes the content visible to search engine crawlers, the software the engines use to build indexes of Web content.

Google helped the four states with a SiteMap pilot, in which they created content lists and assigned URLs to material that didn't already have them. "They provided consultation and the leadership to say, 'This is what you can do, this is how you would do it,' to meet the standards and guidelines that the major search companies set out," said Chris Cummiskey, CIO of Arizona.

Since SiteMap is an open source search protocol, Arizona could've done the work without Google. But Google spurred the project by offering help with the pilot. State employees spent 52 hours indexing eight databases, Cummiskey said.

Arizona indexed databases on jobs, health and human services, emergency response, real estate, licensed contractors, licensed child-care facilities, licensed nursing homes, governor's office announcements, and registered sex offenders.

Virginia created SiteMap indexes for content from 26 of its 90 state agencies in time for the late April announcement, Chopra said. By early June, that number went up to 46.

Applying SiteMap to an agency's content is a fairly simple procedure requiring only a few hours, Chopra said. It involves downloading the public domain SiteMap protocol and then loading in the government content.

Because the states were new to the process, however, they spent additional time preparing themselves. "Every other week, we had training programs, conference calls, WebExes," Chopra said, adding that Google helped with those and with discussions of privacy and other procedural matters.

Agencies must also choose which databases they'd like to be more accessible. Such project coordination seems to take more time than the site indexing itself, Needham said. "And when it's done, once the traffic starts to grow, how are we going to ensure we're supporting that traffic?"

With only eight weeks to get its pilot up and running before the formal announcement, Arizona's Government Information Technology Agency targeted some of the larger, more technologically savvy state agencies for implementation, Cummiskey said. "Some of the medium-sized agencies that we'll be approaching over time don't have the same level of sophistication. It's going to take a little bit more time and energy to get them there."


Custom Search
Google also helped the states implement free Custom Search Engine software. This lets a state create a tool on its Web site to search across the sites of numerous government entities at the local, state, federal and tribal levels.

For example, enter "water" in the "All Arizona Government Search" box, and you'll get results from - among many others - the Arizona Department of Water Resources, the Tucson Water Department, the state Legislature and the Navajo Nation.

This tool offers a tighter focus than a generic Google Web search, Cummiskey said. "We've found that to be very good so far, because it really does create a one-stop shop where citizens can find information they're looking for."

In the future, Google might not work as closely with other state or local governments that want to implement SiteMap and the Custom Search Engine, but the company certainly wants them to join in. "We're also open to work with agencies and jurisdictions on ensuring that their information is fully disseminated, their sources are being delivered, without fee," Needham said.

Because SiteMap is open source, state and local governments can implement it without Google's help, Needham said. "However, we do look forward to introducing further tools that are relevant."

Using techniques such as mash-ups - the melding of data from more than one Web site into a single presentation - Google could help governments present GIS data and other Web resources, he said.

Chopra, for one, likes the idea of creating mash-ups from Google Maps and government data sources - for example, to geographically plot student reading scores and patterns of family and environmental factors that might help account for poor academic performance.

"We have yet to scratch the surface of all that we could do with these tools," he said.


Merrill Douglas Contributing Writer
Platforms & Programs