An explanation of open data, why it's important and how you can do it yourself.
Though the debate about open data in government is an evolving one, it is indisputably here to stay -- it can be heard in both houses of Congress, in state legislatures, and in city halls around the nation.
Already, 39 states and 46 localities provide data sets to data.gov, the federal government's online open data repository. And 30 jurisdictions, including the federal government, have taken the additional step of institutionalizing their practices in formal open data policies.
Though the term "open data" is spoken of frequently — and has been since President Obama took office in 2009 — what it is and why it's important isn't always clear. That's understandable, perhaps, given that open data lacks a unified definition.
“People tend to conflate it with big data," said Emily Shaw, the national policy manager at the Sunlight Foundation, "and I think it’s useful to think about how it’s different from big data in the sense that open data is the idea that public information should be accessible to the public online."
Shaw said the foundation, a Washington, D.C., non-profit advocacy group promoting open and transparent government, believes the term open data can be applied to a variety of information created or collected by public entities. Among the benefits of open data are improved measurement of policies, better government efficiency, deeper analytical insights, greater citizen participation, and a boost to local companies by way of products and services that use government data (think civic apps and software programs).
“The way I personally think of open data," Shaw said, "is that it is a manifestation of the idea of open government."
For governments hoping to adopt open data in policy and in practice, simply making data available to the public isn’t enough to make that data useful. Open data, though straightforward in principle, requires a specific approach based on the agency or organization releasing it, the kind of data being released and, perhaps most importantly, its targeted audience.
According to the foundation’s California Open Data Handbook, published in collaboration with Stewards of Change Institute, a national group supporting innovation in human services, data must first be both “technically open” and “legally open.” The guide defines the terms in this way:
Technically open: [data] available in a machine-readable standard format, which means it can be retrieved and meaningfully processed by a computer application.
Legally open: [data] explicitly licensed in a way that permits commercial and non-commercial use and re-use without restrictions.
Technically open means that data is easily accessible to its intended audience. If the intended users are developers and programmers, Shaw said, the data should be presented within an application programming interface (API); if it’s intended for researchers in academia, data might be structured in a bulk download; and if it’s aimed at the average citizen, data should be available without requiring software purchases.
“Owning Microsoft Office shouldn’t be a requirement for accessing data,” Shaw said, referring to Microsoft Excel, a common file format for data. When possible, open data should come packaged in a variety of file formats that cover as many potential users as possible.
Legally open means open data must be free for all users, or as the handbook puts it, should allow for “universal participation.” It can’t be isolated only to educational use, for example, or bar companies from putting it in products or be under a license that prevents one person from sharing it with another.
For those unnerved about the unrestricted use, Shaw advised to remain calm. The common liability fears are almost always unwarranted and hardly ever realized. “We don’t see it as a huge problem," she said. "I think it’s mostly about a fear of the ‘new.’ Governments are sometimes very risk averse.”
The ultimate advantage of unrestricted use? Interoperability, according to the foundation. “Interoperability denotes the ability of diverse systems and organizations to work together (inter-operate). In this case, it is the ability to interoperate — or intermix — different data sets,” the handbook says.
Though the term open data has been around since at least 2009, the concept is still new. The rules are moving, firming up, gestating. But its youth shouldn’t translate to mean temporary -- the endorsement for open data is there.
In Washington, D.C., the call for open data is taking center stage in a bill that’s passed the U.S. House of Representatives and is on track for a Senate decision. The Digital Accountability and Transparency Act, or DATA Act, if approved, would publish all federal agency expenditures and would require that data be standardized and reviewed to prevent abuse. The House approved it 388 to 1, with 41 members not voting.
Open Datasets in States and Localities
The Sunlight Foundation has listed 30 states and localities with their own open data policies (with numerous others pending) -- see our interactive open data map for more details.
A key lobbying tactic for DTC is to sell the value of open data and downplay the terminology. Hollister says its usually easier to build support for open data among policy-makers and average citizens by telling them what it does. Jobs, transparency, open government, citizen engagement, data-driven decisions, an informed public — these terms are other ways to express open data without saying it directly.
“We’re trying to persuade policy makers to replace disconnected documents with open data and for that purpose we just over-simplify it. We over-simplify it in two steps: number one, standardize [data]; and number two, publish it,” Hollister said. “Even small changes are going to make a big difference.”
The Sunlight Foundation's Shaw echoed Hollister’s stance.
“I’m not sure it’s necessary that the term itself becomes a huge rallying point," she said, "but I think what it enables does have broad public resonance."
Creating open data isn’t without its complexities. There are many tasks that need to happen before an open data project ever begins. A full endorsement from leadership is paramount. Adding the project into the work flow is another. And allaying fears and misunderstandings is expected with any government project.
Need Some Open Data Guidance?
Visit our list of open data resources to determine how you can open up your data.
Not sure which format is best for the data you want to make public and available? Check out some of the common file formats used to share data.
After the basic table stakes are placed, the handbook prescribes four steps: choosing a set of data, attaching an open license, making it available through a proper format and ensuring the data is discoverable.
1. Choose a Data Set
Choosing a data set can appear daunting, but it doesn’t have to be. Shaw said ample resources are available from the foundation and others on how to get started with this — see our list of open data resources for more information. In the case of selecting a data set, or sets, she referred to the foundation’s recently updated guidelines that urge identifying data sets based on goals and the demand from citizen feedback.
2. Attach an Open License
Open licenses dispel ambiguity and encourage use. However, they need to be proactive, and this means users should not be forced to request the information in order to use it — a common symptom of data accessed through the Freedom of Information Act. Tips for reference can be found at Opendefinition.org, a site that has a list of examples and links to open licenses that meet the definition of open use.
3. Format the Data to Your Audience
As previously stated, Shaw recommends tailoring the format of data to the audience, with the ideal being that data is packaged in formats that can be digested by all users: developers, civic hackers, department staff, researchers and citizens. This could mean it's put into APIs, spreadsheet docs, text and zip files, FTP servers and torrent networking systems (a way to download files from different sources). The file type and the system for download all depends on the audience.
“Part of learning about what formats government should offer data in is to engage with the prospective users," Shaw said.
4. Make it Discoverable
If open data is strewn across multiple download links and wedged into various nooks and crannies of a website, it probably won't be found. Shaw recommends a centralized hub that acts as a one-stop shop for all open data downloads. In many jurisdictions, these Web pages and websites have been called “portals;” they are the online repositories for a jurisdiction’s open data publishing.
“It is important for thinking about how people can become aware of what their governments hold. If the government doesn’t make it easy for people to know what kinds of data is publicly available on the website, it doesn’t matter what format it’s in,” Shaw said. She pointed to public participation — a recurring theme in open data development — to incorporate into the process to improve accessibility.
Examples of portals, can be found in numerous cities across the U.S., such as San Francisco, New York, Los Angeles, Chicago and Sacramento, Calif.
Looking for the latest gov tech news as it happens? Subscribe to GT newsletters.