Dive Brief:
- Facebook experienced one of its largest service outages in company history Wednesday due to "a server configuration change," according to a company Tweet on Thursday morning.
- Applications owned by Facebook, including Instagram, WhatsApp and Messenger, were also impacted by the outage, though Instagram was restored Wednesday night. Facebook, down for about 24 hours, said it resolved the issue Thursday morning.
- Facebook acknowledged the access issues consumers were having with the "Facebook family of apps" on Wednesday. After about an hour, it confirmed the issue was unrelated to a DDoS attack.
Dive Insight:
Every minute a company experiences an outage, it loses money. Facebook, a company with a revenue stream built on advertisement and user data, could feel the impact for longer than the day it was down. About 84% of international digital ad revenue is held by Facebook and Google.
That revenue is dependent on a healthy stream of user online actions and interests, but an outage halts that generation of data.
However, "if you're an IT professional in charge of fixing the situation, you couldn't care less about how the damage happened, your main focus is doing what you can to get your organization fully operational again — especially when your company is as massive and used by millions of people," said Steve Blow, tech evangelist at Zerto, in an emailed statement to CIO Dive.
Facebook is a large entity with a consolidation of power and handles its infrastructure internally, as opposed to opting for a public cloud provider.
The social media network released the Open Compute Project in 2011, a community of technologists "focused on redesigning hardware technology to efficiently support the growing demands on compute infrastructure," according to the company website. Facebook's approach is likened to open source but for hardware.
Because Facebook manages its infrastructure, resolution was the company's responsibility.
"It's difficult for even the best teams to catch every potential cascading failure," Kolton Andrus, co-founder and CEO of Gremlin, told CIO Dive in an email. While it's speculation now, the configuration change was likely deployed across several of Facebook's systems, according to Andrus.
"The difference between being down for hours or days versus minutes or seconds is the difference between a solid disaster recovery plan and one that is outdated, barely tested or even non-existent," said Blow.
Cloud-based advancements afford companies the tools necessary for effective recovery. "... There seems to be no excuse" for a serious outage, said Blow when it comes to the wealthiest companies, like Facebook.