CrowdStrike Outage: Short-Term Actions and Strategic Priorities for the Future
Tags:
As most in the industry are aware, a defective content update to CrowdStrike’s Falcon Sensor for Windows led to a global cascade of system outages affecting critical industry sectors such as transportation, banking, healthcare, and public safety.
Many enterprises and government agencies around the world are still actively managing their response to this incident. These response activities should, of course, remain everyone’s primary focus in the coming days, as IT teams work to restore critical systems and services globally. Like many technology companies, we’ve mobilized to support our customers in these efforts in every way possible.
In parallel, we’re also closely analyzing the cause of–and response to–this global incident to better understand how organizations can:
- Ensure that less obvious third and fourth-party impacts of this incident are understood and mitigated.
- Turn critical learnings from this incident into longer-term strategies for organizational resilience.
Short-term response priorities
CrowdStrike has published specific guidance on how to identify and restore systems that have been directly affected by the faulty software agent. Following this guidance to restore services, and revisiting this page regularly for updates, is of course the highest priority action that organizations can take.
But as critical services come back online and business operations are restored, it’s important to consider additional short-term measures that can be taken to minimize business impact and accelerate your organization’s recovery.
A key activity that should be part of any response is to assess any third and fourth-party exposure that your organization may have to this incident. Even if all endpoints, servers, and cloud workloads that are directly used by your organization are unaffected or remediated, there are likely external parties your business depends on who remain affected. Understanding these relationships is critical.
While third and fourth parties may be able to successfully recover their systems and resume primary operations, it's important to understand whether your supply chain may be operating without key security controls in place as a result of this issue–consider that some businesses may simply disable CrowdStrike, in turn operating with a loss of critical security controls.
Bitsight can help organizations assess which of their suppliers are using CrowdStrike and further allow its customers to easily reach out to suppliers and confirm their state of control and associated response plans.
In parallel, it’s also important to consider other critical non-technical response steps, such as:
- Communicating clearly and frequently with both internal and external stakeholders to ensure a coordinated response and maintain or restore confidence in your organization.
- Activating insurance policies that cover dependent business interruption to provide access to critical resources to support your ongoing response and recovery.
Developing a longer-term strategy for third-party risk mitigation
While this incident seemingly came out of nowhere, the reality is that business-impacting incidents that originate with third-party vendors or partners are now a common occurrence. The only difference with this one is the scale. While we often discuss failure of a public cloud provider, this incident has provided an opportunity for everyone to consider broader scenarios that may impact operational resilience. Unfortunately, this recent incident is not the only one to highlight the urgent need for the further operational resilience of the supply chain. Notable examples include:
- A cybersecurity incident at the automotive technology and data firm CDK Global, which caused sustained business interruptions at thousands of auto dealerships and led to an industry wide reduction in sales.
- A similar security incident at UnitedHealth Group subsidiary Change Healthcare that disrupted pharmacy and insurance services for millions of individuals and caused financial damages measured in billions of dollars.
While these incidents, and others like them, had devastating real-world impacts, they were concentrated on specific industries due to the specialized nature of the affected third-party technologies.
So while CrowdStrike is a wake up call for the potential impact of third-party risks, the threat was very real before today.
For this reason, it’s critical for organizations to formalize their efforts to understand third and fourth-party risks and have a plan for proactively gaining visibility across the supply chain. This could number in the dozen, hundreds, or even thousands of suppliers depending on the size of the organization. Understanding the state of recovery and the functioning of critical security controls after the recovery is key.
Essential elements of these types of programs include:
- Scalable methods of directly assessing the risk posture of third-parties your organization interacts with.
- Gradual expansion of this activity to encompass any fourth-party technologies and service providers your business partners and vendors use that could negatively impact your business in the case of an operational disruption or security incident.
- Stronger contractual language to enforce minimum operational and security standards for third and fourth parties.
- Regular reviews of insurance policies to ensure adequate coverage for large-scale operational disruptions or security breaches.
- Proactive outreach to suppliers following initial operational recovery to confirm the state of security defenses.
Preparing for greater regulatory scrutiny
Many government and industry regulators are already focusing attention on third-party risks, and these activities are likely to intensify in response to the CrowdStrike incident. These efforts will likely include:
- Developing an aggregated view of technology dependencies across critical infrastructure sectors and industries – including reliance on technology service providers and software products – in order to identify systemic cyber risks, supply chain risks, sector-wide dependencies, and/or vulnerabilities.
- Evaluating whether market presence or critical infrastructure technology dependencies should create new reliability and security obligations.
Acting proactively with the development of your third and fourth-party risk management program will prepare you for this increased scrutiny in addition to its more immediate impact on your risk posture.
Increasing focus on software development best practices
Whether you are an enterprise developing custom software for internal use or a technology vendor that develops software for commercial products, the CrowdStrike incident reinforces the need for continuous improvement of software development lifecycle practices.
For example:
- Functional testing needs to be performed before release and code reviews must be undertaken to ensure all code operates as it should before deployment.
- Strong policies and processes around change management, incident management, release management and regression testing are essential.
- A well-documented software bill of materials (SBOM) will make it easier to identify, assess, and mitigate risks within your software, including third-party and open source components.
- Proactive steps should be taken to identify and eliminate technical debt before it leads to business-disrupting software buys or security vulnerabilities.
- Planning for a way to disintermediate vendors that may be deeply embedded in the technology stack if they experience a failure that disrupts your own operations or compromises your security controls.
Looking ahead
There will undoubtedly be more that we learn about the CrowdStrike incident in the days and weeks ahead. So it’s important to view all information shared with a critical eye and continue to incorporate learnings into longer-term strategies once the immediate crisis subsides.
Just as there was a SolarWinds and a Log4j before there was a CrowdStrike outage, this will certainly not be the last globally-impacting technology incident. And each of them, however painful in the moment, were powerful learning opportunities that brought positive change to IT operations and security practices globally.
We believe that this incident will likely do the same for the important issue of third and fourth-party risk management.