September 23, 2007 By Merrill Douglas
Government Web sites offer vast stores of public information, but many such sites remain effectively invisible to popular search engines.
As of 2006, 98 percent of state and federal Web sites provided access to publications, and 82 percent allowed public use of government databases, according to a report on state and federal e-government published by Brown University's Taubman Center for Public Policy.
Many local governments have followed suit, providing crime statistics, property tax listings, code enforcement data and a great deal more. But how many citizens find their way to this wealth of government content?
Not enough, say Google officials. And for a search engine provider, that's bad news. "When users search on Google for information related to health, employment, education - the topic areas that are critical for well-being - if they don't find what they're looking for, and quite often that information would best come from government, they are disappointed with Google, not with their government," said J.L. Needham, manager of public sector content partnerships for the Mountain View, Calif.-based search engine provider.
For Google, solving that problem means boosting user happiness, which makes good business sense, Needham said.
Google helped four states - Arizona, California, Utah and Virginia - roll out SiteMap, a protocol designed to make government content more visible on Google and other popular search engine sites.
Though some have voiced consumer privacy concerns because government records may contain personal or confidential information, California CIO Clark Kelso directed all agencies to redact Social Security numbers and other sensitive information from online documents, according to the San Francisco Chronicle.
Visit any government Web site, and you can use a menu system, a search box or a database lookup tool to find information of interest. For example, the Virginia Department of Health Professions offers a database of physicians. Visitors use it to research doctors' credentials, find out if complaints have been filed against them and learn about malpractice claims they've paid.
That's fine if you're aware that the state database exists. But until recently, if you didn't and used Google to research a doctor in that database, you wouldn't have found the information, said Aneesh Chopra, Virginia's secretary of technology. "That database is not crawlable by [Google's] crawler."
This is a problem because most people use Google and other search engines to seek the sort of information governments provide, according to market research firms that track user behavior on the Web.
"On the federal or state level, 60 [percent], 70 [percent] or 80 percent of users, depending on the agency, access those Web services through search engines," Needham said.
Search engines can locate only material that's identified by a uniform resource locator (URL), or the Web address you type into a browser to reach a page. Databases, in particular, often lack URLs.
Learning to Crawl
Several search engine providers, including Google, Yahoo and Microsoft, have adopted SiteMap, a protocol webmasters use to list URLs and other information about site content. That makes the content visible to search engine crawlers, the software the engines use to build indexes of Web content.
Google helped the four states with a SiteMap pilot, in which they created content lists and assigned URLs to material that didn't already have them. "They provided consultation and the leadership to say, 'This is what you can do, this is how you would do it,' to meet the standards and guidelines that the major search companies set out," said Chris Cummiskey, CIO of Arizona.
Since SiteMap is an open source search protocol, Arizona could've done the work without Google. But Google spurred the project by offering help with the pilot. State employees spent 52 hours indexing eight databases, Cummiskey said.
You may use or reference this story with attribution and a link to