Digital Preservation

Washington state utilizes digital archiving to immortalize a legacy.

by / March 9, 2005
On Jan. 12, 2005, Washington Gov. Gary Locke left office after serving two terms, but the virtual face of his administration, stored in the Digital Archives, will remain an accessible part of history.

The Digital Archives, touted as the first of its kind in the nation, is maintained by the Washington State Archives Division, part of the Secretary of State's Office.

"Salvaging Gov. Locke's Web site is an important step in the right direction," said Secretary of State Sam Reed. "We learn from those who have gone before us, and it is our responsibility to preserve our records for future generations."

Locke's Web site has survived in its entirety. The 1,235 Web pages -- Web pages containing valuable insight into his administration, including 1,605 press releases, 536 speeches and 162 media events -- are now available through the Digital Archives.

Historical Interest
Previously records-management systems were designed to accommodate paper records, but with the rapid onslaught of electronic records production and no means to effectively archive those records, vital information was being lost.

In addition, electronic records that were preserved were stored on media that has become inaccessible with modern technology, falling prey to the ever-evolving world of technological advancement. Left in the dust of obsolete legacy technology, digital records can become frustratingly unavailable.

As the first in the nation to attempt a statewide digital archiving system, devising a strategy for retaining and preserving electronic records became a challenge for Reed and his staff, who, after extensive research, came up with the proposal for a content management system proffering an efficient means to access digital records with an easy-to-use Web interface as well as a successful search method for finding those records.

On Oct. 4, 2004, Reed's proposal came to fruition at the 48,000 square-foot facility for Digital Archives located in the Belle Reeves Building on the campus of Eastern Washington University in Cheney.

Filling the Archives
The facility for digital archives provides a standardized central location for state archives, as well as a uniform means to store and access pertinent state records. Making this possible is a highly redundant storage area network (SAN) with the current storage capacity of 5 terabytes (approximately 20 billion sheets of paper) and the capability to conform to the latest technological advancements.

The SAN consists of a high-speed, redundant hardware/software solution from Cisco Systems and EMC. The front end contains a Web content application system utilizing hardware and software from Hewlett-Packard and Microsoft, providing accessibility to indexed and searchable data via the Web.

Data stored on the network will not only be preserved in tape form, but electronic records also will be converted to open file format, such as XML (Extensible Markup Language), and automatically migrated to media compatible with the most recent digital methods for presentation.

Electronic records transferred to the facility's SAN include e-mail folders, directories, databases, documents and Web pages, said Adam Jansen, digital archivist for Washington's State Archives Division.

Remote agencies transfer files via file transfer protocol through an automated process using Microsoft BizTalk Server 2004. Files are sent to a specific location on a server based upon certain parameters such as who's sending them, what office they're sent from, what agency is sending them, and the type of records they are, Jansen said, and additional metadata is added to transferred files when they're received by the BizTalk Server.

"If it's a TIFF image of a photograph, we also create a more Web-friendly version, such as PDF or DjVu by LizardTech, so that we can present the information in a more universal format," said Jansen.

The benefit is that by converting files created by proprietary software, such as Word and Excel, to PDF format, users attempting to access them can do so without being required to have Word or Excel installed on their computer, he said.

"We do not alter any of the original information sent," he said. "We create what we call a Web-readable open standard version of that file so that we can carry it forward years from now."

Capturing Snapshots
Besides focusing on information from remote agencies, the Digital Archives also captures agencies' Web sites for the database.

"We're saving them as blobs -- binary images -- in a single server database," said Jansen. "We're maintaining all of the original scripting, but we're doing it into the database itself so that it can be pulled out and restructured or reconstituted as needed."

To archive Web sites, the Digital Archives uses a custom-created Web-spidering utility, which grabs streams of binary Web information to save the information to the database, said Jansen. A Web spider begins with a single Web page then branches out to subsequent pages through the links connecting them, weaving a web of seemingly endless data retention.

The facility's Web spiders automatically capture participating state agency Web sites at specified intervals and can be configured with certain parameters, such as how deep a spider capture should delve or how to handle links leading to external Web sites.

Before the Digital Archives' efforts, he said, it was impossible to view a site as it appeared in any given moment of the past, before small, incremental changes gradually altered the site and irrevocably replaced what was already there.

"There were no snapshots in time," Jansen said.

Now, sites are being captured in different stages, providing a historically accurate glimpse of government activity.

"Increasingly the Web is becoming the public interface for government," he said. "That's how we're disseminating information to citizens, which is why it is becoming more and more important to capture those Web pages because they are the face of the government that people see."

Storing Public Policy
The need for a successful, standardized archival program derived from state legislation enforcing an open government policy, which mandates that the public have full access to all documentation and records relating to government.

"We needed to come up with a solution to ensure transparent government by preserving electronic records and making them available to the public years from now," Jansen said. As a result, the state hopes to ensure public confidence in state government and reassure the public their interests are being met.

State-required archive information includes: land records; court records; maps; vital records (such as birth, death and marriage certificates); retirement documents; census; codes, ordinances and statutes; government correspondence and documentation; and any additional records with legal or historical significance.

"We don't store records just to store them. We store records which are important -- that need to be kept forever, which really allows us to focus on the records we take in," said Jansen.

Moving Forward
The Digital Archives' first project migrated historic census information and marriage records for three pilot counties -- Spokane, Chelan and Snohomish -- to its centralized database. Now, with the gradual acquisition of electronic information such as Locke's Web site, the Digital Archives Division is expanding and fine-tuning the system for future stability.

"Within two years, we hope to have fully evolved the system and developed the policies and procedures for both accessing and ingesting data to the point where we can really open our doors to the entire state," Jansen said.

The Digital Archives plans to double storage capacity annually, growing from 5 terabytes to 30 terabytes within the next four years.

By merging record management and technology, the Digital Archives offers users the ability to access information anytime, from anywhere, he said, noting that by merging these two worlds, digital archiving allows the state to "blend the traditional archival science of preservation that successfully provides access to the public with the best practices of IT storing and migrating data."

"Our goal is to continue to grow and evolve the system to prove that its fundamental conception is correct and that our execution is right; that the retrieval is smooth and effortless and gives a very robust user experience while still preserving the information for long term," said Jansen.

History will advance alongside technology through digital preservation, continually made available to the future as an untarnished link to the past.

To access the Digital Archives, visit the Web site.
Sherry Watkins Contributing Writer