Mock disaster Recovery Plan

The following is an example of a Hardware & Software Recovery + Work Area Recovery plan. This was created as an academic exercise, based on the needs a mocked-up mid-sized health care provider. 


HARDWARE & SOFTWARE RECOVERY

 

DEPARTMENTS AFFECTED:

§  Customer Relations

§  Claim Payment Activates

§  Customer Phone Contact

§  Utilization Management

 

DEPARTMENT HARDWARE/SOFTWARE AND RTOS AND RPOS:

 A Recovery Time Objective (RTO) is the maximum allowed amount of time between system failure and system restoration. A Recovery Point Objective is the maximum amount of acceptable data loss upon system failure. An RTO of 4 hours and an RPO of 0 hours were determined to be the baseline for all systems within the organization.

   

 

RECOMMENDED STRATEGY:

 

In order to satisfy a 4-hour RTO and a 0-hour RPO, a continuous availability strategy should be implemented for all databases and applications providing services, as it is the only strategy capable of guaranteeing a 0 RPO.

 

STRATEGY IMPLEMENTATION:

 

How Strategy Will Work:

 

Continuous Availability is an implementation of a hot site that has absolutely zero downtime. A hot site is an alternative location that has all the needed hardware and software infrastructure ready to go. In order to provide zero downtime, these facilities are redundant and running at all times, with automatic failover in case of disaster. All data is instantly replicated between databases. An internal strategy should be implemented to create at least one redundant hot site at the San Antonio office location. The Mesa office is also a contender, as all three offices share many work responsibilities and use similar hardware. San Antonio is the better choice as it is more centrally located within the geographic footprint of the company, and has a population approximately triple that of Mesa. An internal hot site strategy does have a drawback of increased operational overhead costs, but significantly mitigates lost revenue in the event of a disaster, as customers should see no interruption in service.


Justification Over Vendor Strategy:

 

Even though it could potentially be cheaper than using an internal strategy, a vendor strategy would carry too much risk given the strict data privacy protections around healthcare data mandated by HIPAA. There is no official HIPAA certification,[1] so there is no way to insure a vendor would be in compliance at all times. 

 

STRATEGIES NOT RECOMMENDED:

 

Replication/High Availability

 

Replication/High Availability is an implementation of a hot site whereby data is replicated on a set interval such as time or volume of data. As a hot site, it has the ability to host the services, however manual intervention is required to switch the system over. RPO is determined by the interval of data replication set by the business, and RTO is determined by the time it takes to manually switch over the systems. While the RPO for this strategy can be very low, because it is not zero it can not meet our RPO goals.

 

Remote Journaling

 

Remote Journaling is an implementation of a hot site where only data is replicated after error checking. This remote system can only provide data backup in the case of disaster. Like Replication/High Availability RPO is determined by the interval of data replication, usually at least 24 hours. RTO would be determined by how long it would take for the restoration of the hardware needed to use this data backup. While the RPO for this strategy can be low, because it is not zero it will not meet our RPO goals.

 

Electronic Vaulting

 

Electronic vaulting is the practice of sending all data to a secure off-site location at a set interval, usually once a day. In addition to the fact the 24-hour RPO is much too high for our needs, putting the data into commission also takes time, which would likely not fit our 4-hour RTO.

 

Warm Site

 

A warm site is an alternative facility that has all hardware infrastructure needed by the department on location and ready to set up. All software would need to be loaded onto the hardware during the restoration process. Given the time it would take to install the software for the number of machines used is likely greater than 4 hours, this strategy would not meet our RTO goals. Even if that were possible, there are no data recovery strategies that can be combined with a warm site that would meet our 0-hour RPO. 

 

Cold Site

 

A cold site strategy is an alternative facility that is equipped with the bare-bones equipment needed to set up a new site, such as HVAC and internet connection. This strategy obviously would not meet any RTO given the amount of time needed to source, ship, and install this hardware. No data recovery strategy can be combined with it to provide a 0-hour RPO.

 

Quick Ship

 

A quick ship strategy entails finding purchasing all equipment needed as well as a facility to use at the time of disaster. This strategy is ineffective for providing any predictable RTO, given the amount of variables at play in emergency situations. Finding reliable legacy equipment and a commercial property to use it in would take multiple weeks of work. While this is the cheapest recovery strategy to plan for, it could be the costliest if executed.

 

Reciprocal Agreement

 

Reciprocal agreements are an agreement with a business that is not a disaster recovery vendor to share disaster recovery infrastructure. This strategy is not recommended due to multiple risks involved, including those outside the scope of meeting RTO’s and RPO’s such as the general cyber security concerns of leaving your data in the hands of a potential competitor. This is especially true with data which requires HIPPA level protection.

 

Hardware and Software Recovery Conclusion

 

Even though continuous availability is the costliest to implement, it is by far the industry standard when it comes to disaster recovery and overall best practices. Outside the bounds of a cataclysmic event effecting both systems simultaneously, the system will never go completely offline, and thus is the only way we are able to guarantee a 0 RPO. It also provides all satellite offices that use these services with an RPO and RTO of 0. The only factor remaining is providing a work area and personal technology for the employees of the affected facility within the allotted 4 hours after a disaster.

WORK AREA RECOVERY

 

RTO FOR WORK AREA RECOVERY: 

Multiple business functions within these departments require an RTO of 4 hours.  

 

SUGGESTED WORK AREA RECOVERY STRATEGY:

 

The Jacksonville Region Operations should relocate to a vendor provided site if the facilities are made unavailable.

 

STRATEGY IMPLEMENTATION:

How Strategy Will Work:

 

A vendor should be contracted to supply the work area and technology infrastructure needed for the operations center. After it is confirmed that the primary facility is unavailable, this vendor will be contacted, and they will then provide this work area and required technology within the time specified in the SLA made with the vendor. The vendor would need to provide only the technology needed for individual workstations such as phones and laptops, as all the mainframe and midrange systems running the software would be available via the internal continuous availability implementation. 

 

Justification over Internal Strategy

 

If the department had a slightly higher RTO score, it may have been possible to implement an internal strategy by spreading the employees across the company, primarily to the San Antonio or Mesa offices. However, flights from Jacksonville to San Antonio are greater than 4 hours. The flight to Rockville MD is only two hours, so can conceivably be used for employees who for whatever reason must work out of an office. Given that the Rockville office is approximately half the size of Jacksonville, it is unreasonable to consider it an option for the entire 311 workers to relocate there.

 

STRATEGIES NOT RECCOMENDED: 

Alternate Facility Own by Company

Using an alternate facility owned by the company is not recommended for the entirety of the workforce as there are no facilities within 4 hours capable of handling an influx of 311 more employees. To be a true disaster recovery location, redundant infrastructure such as computers, tables, chairs, and phones must be kept in storage at the alternate location. Maintaining a large HQ that can hold every department from satellite offices which are larger than itself is not cost effective. This strategy also requires the business to provide travel and housing to displaced employees, which isn’t needed in every disaster scenario.

Find External Site At Time of Disaster (ATOD)

Finding an external site ATOD is not recommended because it cannot guarantee an RTO of 4 hours. Corporate real estate transactions take longer than a week, and any expedition of the process would incur premium fees. It is highly likely that the business would be forced to sacrifice location and affordability which would impact the company for years. All hardware used to connect to the business would also need to be purchased and set up ATOD.

Work From Home:

A work from home strategy does not work well as a long-term disaster recovery strategy for the healthcare industry. Employees may not have access to hardware with the appropriate level of security controls needed to access data protected by HIPPA.

Reciprocal Agreement

 

A Reciprocal Agreement is a contractual agreement between two independent businesses to share work area resources if either business were to experience a disaster scenario. This is not a good option, primarily from a cyber security standpoint. This would leave patient data vulnerable to theft or manipulation. Reciprocal agreements also do not give the business an opportunity to test their disaster recovery strategy, something that should be done one to two times a year.

Worksite Recovery Conclusion

 

Using a vendor to provide an alternate work site is the best way to ensure the entire Jacksonville Operations center is up and running within one week of a disaster in the most cost-effective way. The only other potential option is using Rockville, MD as alternative, but this would likely be costlier in the long run to keep the extra space and technology infrastructure on hand and ready to go. The cost of housing these employees in Rockville can also potentially be avoided if the disaster is localized to the building itself, such as a fire. Vendors will likely have recovery sites available without too much of a difference in commute time for employees.

 

CONCLUSION:

 

Combing a continuous availability strategy for our primary business applications with a vender supplied work area is the surest way to guarantee no data is lost and all business operations can resume in under 4 hours. Because the most likely disaster to strike this area is a hurricane, it may be wise to have the default recovery plan that is tested be the office supplied by the vendor be near Rockville. Having a jet large enough to carry all employees ready to be chartered as part of the SLA would also be necessary, as this would be cheaper and faster than finding 311 commercial flight seats on a moments notice.



[1] https://www.hipaajournal.com/what-is-hipaa-certification/