"The art of the possible," a catchphrase among boosters of the modern transparency movement, may be running headlong into practical necessities.
In late 2008, we cheered when the District of Columbia announced it had surfaced 260 data feeds that could be mashed up usefully by citizen coders. Apps for Democracy, an initial contest with modest prize money, is now a recurring D.C. event and has proliferated to San Francisco, Seattle and New York.
The idea was spread by the Sunlight Foundation in "helping citizens, bloggers and journalists be their own best watchdogs, by improving access to existing information and digitizing new information, and by creating new tools and Web sites to enable all of us to collaborate in fostering greater transparency." The foundation funded Code for America, which created a replicable model for data mash-up contests.
Sunlight cut its chops in this space with Apps for America, which encouraged the same kind of transparency in the federal government. The contest's second round, Data.gov Challenge, found talent to interrogate raw resources in the federal data repository, the holdings of which began with 47 entries and now approaches 120,000 data sets.
It's hard to dispute that information wants to be free, but - and it is an increasingly large "but" - somebody must pay to for the plumbing if transparency is to fulfill its promise.
It isn't that surfacing government data is bad, but it comes with a bow wave. The more data feeds, sets and sources that are surfaced, the larger the wave. Government sets the wave in motion for all the right reasons and now finds itself with an unpaid mandate of its own creation - providing context.
In a recent analysis, Daniel Castro, senior analyst of the Information Technology and Innovation Foundation, wrote, "Although Web sites like Data.gov provide tools for users to rate the quality of data sets, agencies responsible for maintaining data sets should take on more responsibility for noting any data quality issues. For example, agencies should make clear any known limitations of data sets, such as poor survey response rates, grossly inaccurate data or outdated information."
There are also the serious matters of data definitions, standards and architectures - the life's work of a small, unsung group of data professionals. They make the case for bringing old-school disciplines to these new pursuits. It's the kind of thing you can't get done by crowdsourcing alone.
Several states - Maine, Utah and California - have brought data sets (about 40 each, excluding GIS data) together in a single spot on their respective portals. Those relatively small numbers may prove advantageous as they and others ramp up for what comes next. In addition to raw data, states are packaging and presenting data in consumable ways - through stimulus tracking tools, searchable state checkbooks that show revenue and expenditures, and campaign finance disclosure services.
Whether done by governments or third parties (friendly or not), and 44 years after the dawn of the open government movement, we still may be closer to the beginning of the process than the end. Perhaps the greatest risk is the digital equivalent of malicious compliance - where government makes available huge volumes of "undisciplined" data in ways that can't be used to hold public agencies accountable, keep communities safe, fuel economic activity or some other public good.