Article
Why the CrowdStrike outage should spark a reexamination of business continuity and recovery plans
Is your organization prepared to handle unexpected technology failures?
August 22, 2024
The recent CrowdStrike incident has reminded business leaders of the massive potential for disruption when a critical technology component malfunctions or becomes unavailable. The impact isn’t limited to security tools like CrowdStrike: Imagine the impact to a business if a credit card payment processor goes down, or if a critical data center has an outage.
Regardless of the system or root cause, organizations need to be prepared to respond and recover quickly from a disruption. And in the recent CrowdStrike example, the existence of a documented, well understood, and actionable plan was likely the difference between getting back to business quickly or being dragged in the news for service interruptions.
Best Practices for Business Continuity: Setting Recovery Goals & Documenting Plans
When it comes to business continuity planning, organizations should keep the following best practices in mind:
- Understand the impact of different outages: What does it mean for a system or resource to be unavailable? Conduct a business impact analysis to understand the critical processes in the organization, how long these processes can unavailable before there is a critical impact, and the technologies that support these processes.
- Agree on recovery objectives: How long can a system be down, and how much data loss is acceptable? During the business impact assessment, business and IT leaders should work to agree on recovery time (how long a system can be down) and recovery point (how much data loss is acceptable) objectives. Don’t wait until an outage to discover that the business expects no more than 5 minutes of data loss, but IT is only performing daily backups.
- Document business continuity plans: How will operations continue without critical systems or resources? Business leaders should document the process for maintaining critical processes during a disruption or approach for mitigating impact to the business. Business continuity plans typically include approach for using manual workarounds, leveraging alternative systems, and communicating with customers, employees, and other key parties.
- Rehearse a business continuity exercise: Will the documented business continuity plans be effective? Testing business continuity plans is essential to identifying gaps in the plans and ensuring employees are prepared to respond effectively. Without pressure testing the plans prior to an incident, gaps that would make the plans ineffective may not be discovered.
Best Practices for Recovery: Defining Responsibilities & Rehearsing the Plan
Once a business continuity framework is in place, those details can inform a disaster recovery plan, which should:
- Define responsibilities: Who does what during a disaster? Document who is responsible (and who is their backup!) for declaring a disaster, for invoking the plan, for performing various recovery actions, and for communicating to both management and to employees at large, so that the expectations are clear.
- Establish priority: What should be recovered first? Some outages may impact a single system, in which case the recovery order is simple, but in a case like CrowdStrike, multiple dependent systems were often impacted simultaneously. Based on the business impact analysis, document the recovery order of systems and applications to focus the team’s effort on the most critical tasks.
- Document recovery processes: But what do people actually do to recover? Many disaster recovery plans lack the actionable detail to guide recovery efforts, especially when primary resources are unavailable during recovery operations. Document the processes for running data restore jobs, for reconfiguring applications, and for failing back from a DR site after an outage. Reference existing documentation for key processes, if it exists – but make sure it’s all available in an easy to access location during a disaster.
- Rehearse disaster recovery exercises: Does everyone understand their role? Even if not required by regulatory frameworks, practicing the plan regularly is a good idea. Creating relevant, realistic disaster recovery scenarios can refresh the team’s understanding of critical tasks and processes, identify areas where the plan can be improved, or find gaps between the real-world technical recovery capabilities and business expectations. Conduct post-mortem analyses of these exercises and update the plan accordingly to fully capture what’s been learned.
With defined plans, clearly understood expectations, and documented recovery processes, organizations are in a much better position to swiftly respond to an incident, maintain business operations, and recover as quickly as possible. West Monroe has seen firsthand that companies that invest in resilient operations are able to quickly mitigate the impact of a disruption, while those without pre-defined plans struggle to determine the actions that should be taken.
If your organization needs assistance with preparing business continuity or disaster recovery programs and plans, West Monroe can help. From conducting rapid business impact analysis to assistance in documenting plans, to developing and testing business continuity and disaster recovery scenarios, West Monroe’s team of resilience and industry experts can help you quicky enhance your business’ operational resilience and ultimately reduce risk.