The U.S. Census Bureau worked with data.world of Austin to integrate and streamline its data in the Amazon Web Services cloud, making it easier to use by states, counties and cities.
The nation’s next wholesale population count is still more than two years away but U.S. Census Bureau officials are not only preparing to migrate that process online — they’re also working with private industry to devise better ways of sharing the yearly American Community Survey (ACS) data that’s produced.
Conducted every decade since 1790, the Census is the forefather of national surveys. But as officials are quick to point out, it focuses on population. That’s why the Census went live in 2005 with ACS, a shorter examination that looks at socio-economic, housing and demographic data among roughly 3.5 million recipients.
Gerson Vasquez, a data dissemination specialist for the U.S. Census Bureau, said officials intend to take responses to the 2020 Census online and through more traditional means of telephone, paper forms and in-person interviews.
But for roughly the past year, they’ve also partnered with Austin public benefit corporation data.world, which is building "the most meaningful, collaborative and abundant data resource in the world," to leverage data's "societal problem-solving utility."
The goal has been to better integrate and streamline the various components of ACS data, making them easier to use by states, counties and cities.
ACS data, Vasquez said, has robust layers of characteristics but until fairly recently, its formats — including PDF, and CSV and FTP downloads of larger data files — weren’t easily used by public agencies or residents.
The collaboration between data.world and the U.S. Census Bureau, announced in March at the annual South by Southwest (SxSW) gathering, set out to change that. It began when the company connected with the U.S. Census Bureau through a mutual contact and applied for a National Science Foundation grant to fund work on a semantic model for the census.
That work, by student Jonathan Ortiz, was so intensive the company hired Ortiz full-time to finish it and several derivative projects.
Ortiz, now a data scientist and knowledge engineer, will discuss his work during part of a breakout session on Friday, May 12, at this year’s ACS Data Users Group conference in Washington, D.C., in a segment titled “Leveraging Linked Data: Semantic Web Technologies Applied to ACS Data.”
“A lot of the work that we do is, as people are doing their data work, finding the connection between data sets. We looked at the Census data as really a super-connector of a data set. Many data analysis projects can use Census data, so many can relate to [that],” data.world CTO and Co-Founder Bryon Jacob told Government Technology.
He explained what Ortiz and others wanted to accomplish was curating Census data and building an ontology around it “describing the process in the Census in a machine-readable way.”
Previous versions of Census data had used data dictionaries to provide continuity, as sources of metadata that explained what was in the Census data, but were separate. Data.world combined and automated these sources, creating a process that started with the 2014 ACS and adapted subsequent editions, doing the hard work for users.
“They created a kind of way to ingest, not just the output of the (ACS), which you can think of as a database or rows and columns, but also ingest the metadata that describes what all the individual cell values are,” said Jeff Meisel, chief marketing officer for the U.S. Census Bureau.
The super-connector concept, he explained, means data scientists can “more quickly get into the art of working on the research problems and creating data models without this massive learning curve of just ‘what’s in the data?’”
Amazon, Meisel told Government Technology, learned of the research and volunteered to host the complete data set, which now includes individual strands focused on everything from income to poverty, and foreign birth to health insurance.
“A data scientist can now go to the cloud and they can spin up their own [Amazon] Elastic Compute Cloud instance and begin building replicas to work on their research in the cloud. There’s instructions and there’s kind of a methodology for how you might do that,” he added.
The collaboration is so new, the census officials said, that it hasn’t yielded any use cases — though they can already see the potential.
Last year, Vasquez said, the New Orleans Fire Department and the New Orleans Office of Performance and Accountability used block group-level data from ACS five-year estimates to distribute more than 10,000 smoke alarms to homes identified as in need.
“It wasn’t necessarily driven by data.world or this kind of connection — this is still relatively new. But why couldn’t it be? Why couldn’t it be replicated across the nation in a quicker format, because the infrastructure is there. Because the more people play with our data, the more great stories we have,” Vasquez said.