Facebook Blames Engineering Error of ‘Our Own Making’ for Global Outage
A simple technical mistake caused a global outage Monday that left more than 2.9 billion internet users unable to access Facebook, Instagram, WhatsApp and other tools.
The outage was “caused not by malicious activity, but an error of our own making,” Santosh Janardhan,
vice president of infrastructure, wrote in a blog post Tuesday.
The disruption, which lasted close to six hours, began around 11:40 a.m. Eastern Time on Monday when Facebook engineers were trying to do routine maintenance on one of the company’s data centers, the blog post said.
Seeking to get a read on Facebook’s networking capacity, engineers issued a networking command that inadvertently cut all of Facebook’s data centers off of the company’s network. That led to a cascade of failures that pulled Facebook’s properties off of the internet.
Seeing that the data centers were offline, servers that used the Domain Name System, or DNS, to direct internet traffic pulled themselves off of the internet. DNS is what browsers and mobile phones use to find Facebook’s services on the internet, and without it it was “impossible for the rest of the internet to find our servers,” Mr. Janardhan said.
The DNS changes also disabled internal tools that would have allowed Facebook’s engineers to restore service remotely, forcing Facebook’s engineering staff to drive to data centers and restart systems there.
That took more time. “They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them,” Mr. Janardhan, said. “So it took extra time to activate the secure access protocols needed to get people onsite and able to work on the servers.”
Write to Robert McMillan at [email protected]
Copyright ©2021 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8
For all the latest Technology News Click Here