September 23, 2008 By Chandler Harris
Data storage is a term CIOs usually don't want to hear, since it often entails either more expenditures or compliance problems. Yet CIOs find more than ever they must talk about it, since data streams for all organizations have been increasing exponentially, requiring more hardware, space and energy.
Currently there's an insatiable demand for data storage capability, with the need for storage capacity increasing 65 percent to 70 percent a year, according to research firm Gartner. A 2007 study by another firm, Nemertes Research, found average storage space needs for business are expanding anywhere from 20 percent to 150 percent each year.
The growth in storage demand is driven primarily by the increasing amount of information being collected and compliance requirements for retaining information.
"The annual growth rate for data storage is tremendous," said Stan Zaffos, research vice president of Gartner. "Yet users' ability to manage data is not increasing at the same rate, and therefore, there's an inherent problem."
Though the price of new storage devices has been decreasing by about 34 percent annually, the cost of increasing capacity and service demands is 60 percent annually, according to Gartner. That storage trend has directly affected IT budgets across the board because server facilities accounted for 1.5 percent of U.S. electricity consumption in 2006, according to the U.S. Environmental Protection Agency. More servers dedicated to data storage also directly raises maintenance and new equipment costs, putting pressure on government IT leaders to find more budget allocations for data storage upgrades.
Yet a reticence by government to make data storage upgrades can be even costlier, according to Gary Shoemaker, business development manager of EMC's state and local government division. He has seen many governments delay upgrading.
"If organizations replace old storage with better-performing storage, it will save money compared with consistent maintenance costs," Shoemaker said. "When you start paying maintenance costs after three or four years, it adds up to what it would cost for a new storage system. The cost of not doing something is sometimes more expensive than people realize."
As little as five years ago, an organization's storage analysts managed an average of one to two terabytes of data. Some organizations now manage 100 terabytes. The question has been, in effect, how do organizations - especially governments - manage explosive data storage growth if they lack enough resources?
A further issue for data storage is the increasing trend toward "green" computing, such as energy conservation. With more servers holding vast amounts of data, requirements have increased for cooling and square footage. In September 2007, the U.S. Department of Energy's Energy Efficiency and Renewable Energy Division signed a memorandum of understanding with The Green Grid, a consortium of IT companies seeking to lower power consumption in data centers worldwide.
While tape-based data storage systems traditionally have been efficient and are still being used effectively, the dropping cost of disk backup has created a faster, and affordable alternative for many organizations. Disk backup, also called a virtual tape library, at times offers a lower-cost solution with higher storage capacity, since they emulate tape functions.
That was the case in Fulton County, Ga., which averaged tens of thousands of dollars in expenditures for data tapes every four months before 2003. As part of its plan to upgrade data storage, the county purchased a tier 1 disk-based data storage system that provided more capacity and eliminated the need for additional tapes.
The county used a cost-effective means that many organizations are using to maximize data storage: tiering. Organizations use tiering to allocate data by priority to multiple data storage units. Tiering can help organizations save money by utilizing expensive tier 1 data systems for high-priority data that is accessed frequently or is vitally important, while using cheaper data storage systems for lower-priority data. Tiering can reduce the amount of storage data, and therefore reduces costs by deleting or not backing up unneeded data. It also reduces network traffic by putting rarely used data offline.
The new tiered storage system was a success in Fulton County, but local officials quickly discovered it would be short-lived. Memory would eventually run out.
"Once we got [the tier 1 disk-based storage system] implemented and working in 2003, we stepped back and said, 'That's great.' But as you know, data grows and grows and grows, and the explosion of information growth is one of the leading things people in our business have to deal with," said Jay Terrell, chief technology officer and IT deputy director of Fulton County. "We began to archive things like e-mails and 911 recordings, while not wasting the high-speed storage of the tier 1 network and meeting legal requirements."
Fulton County developed an information life cycle management plan to prioritize data and purchased tier 2 and tier 3 data storage systems for network-attached storage and content addressable storage for archival purposes, respectively.
"Information life cycle management is having the data on the right piece of storage at the right time - at the right cost," EMC's Shoemaker said. "When you create new data, often it is accessed quite a bit in the beginning, but as it gets older it's not accessed and the value of the data decreases."
Deduped by Technology
With organizations storing data at record rates, data duplication has also increased, filling servers with excess information and forcing organizations to buy more data storage space. Yet a fairly new technology called "data deduplication" has emerged as an important tool for reducing data storage space and cost.
Data deduplication is a form of data compression that maximizes storage space by deleting or not saving redundant data. Deduplication allows organizations to reduce growing storage demands and hardware investments while maintaining backup capabilities.
While data deduplication isn't yet a widely used tool in the data storage market, adoption rates are skyrocketing - within five years an estimated 75 percent of all businesses are expected to incorporate deduplication technology, according to Gartner. Deduplication primarily can be found as a feature of hardware- or software-based storage management platforms and not as a stand-alone product.
"Data deduplication translates to a huge savings and a big impact from a green perspective as well," said Gartner's Zaffos. "When you look at data compression ratios, the typical compression ratios range from 1-to-2, to 20- to 30-to-1 and in some cases 200- to 300-to-1."
For Seattle, data deduplication has been an important tool to reduce data storage. After the city tiered its data with multiple-level servers, the city implemented data deduplication software to stamp out redundancies.
"What happened was if I sent a [Microsoft] Word document to people on multiple servers, a copy was saved on every e-mail server; so I've got up to 30 copies," said Bill Schrier, chief technology officer of Seattle. "A copy of that document would sit on every post office server, but deduplication software helps create only a single copy of an attachment to an e-mail that sits in one place."
Seattle also takes advantage of another trend: storage virtualization. Generally speaking, storage virtualization creates abstractions of actual hardware storage devices, providing a way for users or applications to access storage without having to know where or how the storage is located and managed. The benefits of storage virtualization include the ability to share physical storage across multiple application servers, which dramatically simplifies data migrations and decreases the need for new equipment.
The Seattle.gov portal has 25 servers performing different functions, but underlying them are five physical servers, which helps reduce power consumption and the city's carbon footprint, Schrier said. With 500 servers in the city's data center, Schrier will pursue server virtualization for them over the next year. Overall, virtualization helps Seattle reduce power consumption, reduces the demand for new hardware and helps manage storage tiers.
"The big piece to server virtualization is saving energy costs, since power is getting very expensive," said Jeff White, storage specialist for CDW-G. "Another piece is straight server count. If you replace 10 servers, but only buy one server, you're saving in hardware costs. You also save money since managing one server versus 10 is a whole lot easier for IT staff."
Another notable development in data storage is using the Internet Small Computer System Interface (iSCSI); industry insiders call it "iScuzzy." It's an IP-based storage networking standard that links data storage facilities, which lets organizations deploy storage area networks (SANs) by increasing the capabilities and performance of storage data transmission. Previously SANs primarily used fiber channels that were often too complex and costly for small and medium-sized organizations.
With exponential data growth, data center space challenges, soaring energy prices and falling IT budgets, it's important for CIOs and storage experts to look at the assortment of new technologies from storage vendors. Data storage vendor Copan estimates data is growing 40 percent to 120 percent annually and doubling every 18 months - a "data center crisis," according to the vendor. Consequently some analysts say CIOs and data storage managers must have a more simplified storage architecture to reduce total costs.
"As soon as you say the percentage spent on storage is increasing faster than budget, it puts tremendous pressures on other areas," Zaffos said.
You may use or reference this story with attribution and a link to