Article
West Monroe's take on the CrowdStrike-Microsoft outage
July 19, 2024
A recent CrowdStrike update caused widespread disruptions, particularly affecting systems running on Microsoft infrastructure. This incident impacted various services, including point-of-sale systems, authentication, and communication technologies. Organizations with systems dependent on Microsoft and lacking appropriate redundancy faced significant challenges.
Scale & Impact
- The disruption was widespread, affecting many industries relying on Microsoft systems. Critical services such as point-of-sale devices, authentication systems, and core communications were impacted. Systems dependent on Microsoft infrastructure without proper redundancy were the most vulnerable.
- The issue originated from an update by CrowdStrike that interfered with the booting process of Windows systems.
- The Azure central region was specifically mentioned as being impacted; organizations with systems deployed in different regions within Azure were less affected.
Response & Mitigation
- Larger organizations with robust IT teams and automated tools were able to address the issue more swiftly than smaller organizations, or those without sufficient IT resources, which are facing prolonged disruptions.
- CrowdStrike provided guidance on removing a specific file to resolve the issue, which larger organizations could implement quickly.
West Monroe’s Long-Term Recommendations
The incident underscores the vital importance of robust software quality testing and meticulous release processes. A balanced approach between fast releases and thorough testing is essential to prevent significant disruptions.
- Enhance Redundancy. Ensure systems have appropriate redundancy and are not solely dependent on a single provider or technology.
- Strengthen IT Infrastructure. Invest in robust IT infrastructure and teams capable of responding to widespread issues swiftly. Utilize automated tools for managing and deploying updates across systems.
- Prioritize Quality Assurance. Implement comprehensive software quality testing and meticulous release processes. Focus on customer satisfaction and reliability rather than solely on cost-saving measures.
- Plan for Incident Response. Develop and maintain an incident response plan that includes steps for quickly addressing and mitigating disruptions. Ensure all end points and distributed systems can be managed efficiently during such events.
- Collaborate and Communicate. Maintain clear communication channels with technology providers like CrowdStrike to receive timely updates and guidance. Foster collaboration between IT teams and service providers to address issues effectively.