State and local governments and allied community and civic groups made an extraordinary effort to ensure the highest level of participation in the 2000 U.S. census. The stakes were high: The results are used to determine congressional district boundaries and the allocation of hundreds of billions of federal program dollars.
An added, but somewhat underutilized benefit is that the results are available for governments and the public to use however they wish. In fact, census summary results should prove as useful to state and local constituencies as they do to the federal government, informing all sorts of programming and planning decisions. So what data is available, and where can it be found?
The first census results, the populations of the states, were released in December 2000. They have been followed by other population and housing statistics based on a survey of 100 percent of American households. Congressional redistricting data sets mandated by federal law were released in March 2001, followed by Summary File 1, a series of 286 detailed summary-data tables. Summary File 2, 47 tables produced for 250 iterations of race, ancestry and ethnicity, are slated for release this fall.
Data set releases are complemented by statistical briefs, which analyze particular topics and geographic areas, and by demographic profiles, which provide a concise summary of key statistics. Complete information on data products -- data sets, briefs and profiles -- and the release schedule are available
Census 2000 summary data sets are released on CD, DVD and the Internet. They are available for download and for interactive search, query and mapping via the Census Bureaus American FactFinder (AFF) Web site. The AFF site, which was built by IBM Global Services on contract to the Census Bureau, is an excellent tool for exploratory analysis, offering statistical tables and thematic maps and interfaces suitable for a range of users from school children to subject-matter experts.
AFF launched in early 1999, offering data from the Census 2000 Dress Rehearsal conducted in April 1998, the 1997 economic census and from the early phases of the Census Bureaus new American Community Survey. In light of early experiences, the bureau and IBM have enhanced the sites appeal, designing a cleaner interface without frames or cookies, refocusing on geographic areas rather than on particular surveys or data sets and adding convenient features such as an address locator mapping street addresses to census geographic areas. The site was also rebuilt to support thousands of simultaneous users. AFF is coded with Java servlets running in the IBM WebSphere application server accessing an Oracle 8i data warehouse. It runs on IBM RS/6000 SP clusters, one serving internal Census Bureau users and the other serving the public.
Production and Analysis
Census 2000 summary data set production poses difficult technical problems, compounded by extreme accuracy, accountability and security needs and a strict release schedule. While data set users will be able to fruitfully work with the data using common desktop software, the steps that the production team went through may be instructive for power users.
The bureaus analysis system runs on an eight-processor IBM RS/6000 M80 with 16GB memory and a four-terabyte disk storage system. The SuperSTAR analytical software suite from Space-Time Research of Melbourne, Australia, forms the heart of the system, providing a graphical user interface for Census Bureau users to compose tables and a fast tabulation engine. Although SuperSTAR is similar to many online analytical processing tools, it may be unique in its combination of ease-of-use and suitability for both ad-hoc and large-scale analysis of microdata classified according to large, hierarchical dimensions. For instance, census data are summarized according to geographic hierarchies with