When a faulty software update pushed by a web security company caused business and institutional computer systems to crash around the world last week, Metropolitan State University of Denver’s Information Technology Services team pulled an all-nighter to fix the problem by 6 a.m. Friday.
“This was one of those right-people, right-place, right-time situations, where a couple of folks were up and looking at their computers,” said Nick Pistentis, the University’s deputy chief information officer. “They saw some alerts and were able to rally the troops and do so pretty rapidly. We were well into diagnosing and troubleshooting before we started seeing any discussion online about this being a global impact.”
The flawed update from CrowdStrike, a leading cybersecurity company, affected machines running Windows software and caused major disruptions throughout metro Denver starting shortly before 11 p.m. Thursday. Regional Transportation District light-rail train service was suspended, and the Colorado Division of Motor Vehicles and the Colorado Department of Revenue were also hit with outages.
Also, flights out of Denver International Airport and across the country were grounded, stranding thousands of travelers.
MSU Denver has been a CrowdStrike customer since 2018, using the platform as an advanced malware-detection tool for Windows servers and key IT workstations.
“We have about 400 devices running the software,” Pistentis said. “We immediately had a pretty high risk there — our file servers, our phone systems and a lot of utilities that maybe are transparent to the campus but critical to the University’s operation.”
Cybersecurity firms often push new updates to ensure that their software can detect and block new malware threats, he said. “As malware evolves, day by day, hour by hour, the vendor wants to provide the most recent definition tools and detection mechanisms. If they see a new behavior in the wild, they want you to have that locally, so the tool is finding it.”
The University operates about 5,000 computers, and around 40,000 people use its technology regularly, Pistentis said. “Ten percent of our total owned footprint is running CrowdStrike. They are high-impact systems, so it has an outsize impact.”
As IT team members figured out the scope and cause of the problem in the wee hours of Friday, he said, they had to restart servers in safe mode, manually remove the faulty software and reboot the devices, mostly using a workaround published by Microsoft and CrowdStrike, he said. About 10% of the devices required additional troubleshooting.
Kevin Taylor, the University’s chief information officer and associate vice president for ITS, said the ITS team trains for major outages such as this one. On-call managers are authorized to activate a rapid response when computer systems crash.
“We do tabletop exercises, and we rehearse this and make sure we have things in place, but it’s also great to see that when we had an incident like this, that incident plan went into effect and basically was run by a playbook that we’ve already predefined,” he said. “Our team is amazing, but it’s really great to see them in action in moments like this.”
Special thanks to the following IT team members who responded to the outage:
- Anna-Liisa Breit
- Khanosak Chan
- Lee Crawford
- Darius Jack
- Ed Jacobs
- Jeff Keil
- Ethan Lane-Ngyuen
- Ryan McKenna
- Jesse Nguyen
- Corey Oxenbury
- Steve Patterson
- Nick Pistentis
- Reto Schultess
- Sylvia Valdez
- Jason Yee