Balancing Speed with Safety
Balancing Speed with Safety
Peter Hesse

By Peter Hesse

Peter Hesse is a partner and EVP at 10Pearls, a global digital transformation company helping businesses innovate, digitalize, and scale.

Lessons from CrowdStrike: Balancing Speed with Safety in Mission-Critical Systems

Cybersecurity bug disrupts travel industry

The news over the last week or two has been dominated by a CrowdStrike bug that took down many Microsoft Windows-based systems early in the morning of July 19, 2024. This had especially wide-ranging impacts in the travel industry, with Delta Airlines facing days of delays and cancellations as a result of the issue.

CrowdStrike has now released their preliminary post-incident report which describes how the error came to pass and how they missed it — including steps they could take in the future to reduce or eliminate similar issues. 

What caused the crash

It may be hard to believe, but CrowdStrike admitted in their report a lack of end-to-end testing of their solution. Specifically, CrowdStrike deployed an update that passed an initial test through their content validator. The content validator itself had a bug in it, allowing data through that caused Windows-based computers to crash.

This is actually not uncommon. Software developers can be lulled into a false sense of security when a number of changes or updates are made without having any impact. The changes worked before, so why would anything be any different next time?

By relying only on the unit test through the content validator, and never testing their solution end-to-end, to ensure that the update didn’t impact the end systems after deployment, they failed themselves and their customer base.

what caused the crash

The need for rapid response

On the flip side, one could argue that CrowdStrike must release updates as quickly as possible. Consider the scenario of a new “zero-day” strain of ransomware spreading rapidly across the internet, encrypting systems and demanding ransom. As seen in the Change Healthcare and Ascension Health attacks earlier this year, such incidents can be catastrophic, not only affecting organizations but also the patients, citizens, members, and customers who depend on them. 

Unlike traditional antivirus solutions that typically receive daily updates, CrowdStrike clients benefit from continuous updates on indicators of compromise. This rapid distribution and kernel-level access enable their solution to respond almost instantaneously to spreading threats.

For those advocating for more extensive testing before deployment, the urgency may become clear when ransomware devastates your entire corporate infrastructure. 

Striking the balance between speed and safety

It’s possible to balance the speed of quick updates with the safety of comprehensive testing by embracing automated testing. CrowdStrike, along with other mission-critical systems, must invest in automated testing across their entire deployment spectrum. 

Advantages of test automation

Test automation can save time, reduce human error, enhance test coverage, and improve testing capabilities. It can also handle batch operations and execute parallel processes simultaneously. This approach should be part of a holistic test strategy encompassing test planning, test case development and execution, goal setting, and reporting. 

An automated testing infrastructure that thoroughly tests every part of the process end-to-end builds confidence in the resilience of mission-critical systems. CrowdStrike could implement a checkpoint in their update process where updates are first applied to a series of representative systems and rigorously tested. Properly automated, this system could perform end-to-end testing in minutes. Updates would only be deployed once the process confirms no system failures, unresponsiveness, or performance issues. 

How 10Pearls can help

If you’re interested in adding security robustness and automated testing to your mission-critical infrastructure, contact us.

We support DevSecOps and automated testing in mission-critical systems across key organizations like energy & industrials, healthcare, and finance. Let us know if we can help you with an automated testing strategy to build resilience and safety without sacrificing speed.

Self-Healing Test Automation 

Leveraging our expertise in QA with AI test automation for better accuracy, efficiency, and productivity.

Related articles