Sept 95 By Joe Plasky Conner Storage Systems With a growing number of state and local government agencies dependent upon PC LAN systems for managing day-to-day activities, fault tolerance and disaster prevention planning have emerged as important issues. Government managers recognize that losing important data can devastate an organization, halting the delivery of vital services or impacting its ability to make policy judgments. It is critical that government technology managers understand the issues involved with disaster prevention planning, and take steps to create and implement their own plan

FAULT TOLERANCE Installing fault tolerant systems to prevent data loss should be a first step in any government agency disaster prevention plan. For example, RAID (Redundant Arrays of Independent Disks) systems offer data redundancy and non-stop performance by tightly integrating a series of hard disk drives in one array. They provide reliable, cost-effective solutions for protecting real-time data in the event of a server's hard disk failure. With a RAID system, data can be accessed during a disk failure, replacement or repair

RAID systems offer high-capacity data storage, making them particularly well-suited for data-intensive applications, such as GIS, imaging and multimedia. Most RAID systems allow users to increase capacity by either upgrading the disk drives or by adding modules to the RAID system. In selecting a RAID product, look for a system that is Novell-certified for NetWare 3.1x and 4.0x, the major LAN operating systems in state and local government

RAID systems provide many data storage benefits. For example, RAID systems are designed for reliability. With multiple disk drives, and redundant power supplies and cooling fans, the mean time between data loss for a RAID system can be in excess of five million hours. This feature is crucial when an end user's data must be available at all times. Also, a RAID system should provide hot swapping capabilities so that failed drives can be replaced online without system interruption or loss of data. This means that a drive failure would no longer result in the loss of data or even the loss of access to data

DISASTERS: LARGE AND SMALL An effective disaster recovery plan addresses two types of disasters. The first is a local incident that affects the operation of the PC LAN system but still gives administrators the ability to work the problem on-site. The second type of incident is a natural disaster, such as an earthquake or fire. When a natural disaster strikes, an entire region can be affected

This often means the system and its support structure are not accessible for days and data may even be permanently lost

A basic disaster prevention and recovery plan has four essential parts: * Risk analysis and site audit * Hardware protection * Data protection and recovery * Contingency plans

RISK ANALYSIS AND SITE AUDIT Risk analysis is the initial audit that should identify types, locations and amounts of hardware, software and data. Hardware and software audits can be simplified by using software that will automatically inventory your departmental network. You will need this information to plan how much hardware and software will need to be replaced in a major disaster, who needs it, how much it will cost and where it is located

Data audits are often more difficult. Conducting a data audit will identify the most critical computing needs of a government office, who uses the data, and how long the agency can survive without the information and computers that are needed to access the information

HARDWARE PROTECTION Once a risk analysis has been conducted, it is wise to perform a site audit of all LAN related hardware. With hardware, preventive measures are key to avoiding localized disasters. This includes, but is not restricted to: Security Have you put measures in place so equipment does not "grow feet?" Be sure to perform regular physical inventories and

none  |