It should come as no surprise that websites thrive on traffic. So naturally, it follows that driving traffic to your site is a strong motivation for any company looking to grow their web presence. However ironically, driving traffic to your site can also be a double-edged sword if your infrastructure is not properly prepared to handle the load. This means that, strangely, popularity can actually become a potential cause of an outage.
Yesterday, popular Internet forum and message board Reddit discovered this firsthand.In an interesting campaign move, President Barack Obama graced the site with his presence by doing an “Ask Me Anything” (AMA) thread, a message thread in which commenters submit questions and the original poster responds. Word about this rare opportunity to send the President of the United States a direct message spread across social media like a wildfire, leading to a massive spike in traffic that ultimately brought down Reddit a mere few minutes into the life of the thread. Current figures show that their number of unique connections and pageviews both more than tripled compared to their typical traffic. Eventually the site came back online and the AMA progressed as usual.
This sort of outage has been documented several times in the past. It is ultimately much like a Distributed Denial of Service (DDoS) attack, only with legitimate site traffic as opposed to a 50,000-computer botnet. One of the first instances of this kind of outage came from MetLife during the fledgling days of the Internet. Looking to drive traffic to their site, they bought several commercial spots during the Super Bowl and ran an incredibly cryptic advertisement that simply asked “What is M?” As a result of this marketing campaign, their site traffic spiked and ultimately crashed their servers.
Another prescient example of such a phenomenon comes from retail giant Target.In 2011, when they launched their highly publicized Missoni line of clothing, so many shoppers connected to their site to look at the new Italian fashion items that their site, too, crashed under the load of all these connections. This resulted in a surge of angry tweets which only drove more negative attention towards them. In this case, Target suffered some reputation damage from something that should have only driven their popularity and revenue through the roof.
This type of outage is becoming more and more common (see Apple,Coca-Cola,the state of California, and many others) where an enthusiastic response to an advertisement, event, or new product has brought the online presence of a large organization crashing down. Most companies don’t address these types of events in disaster recovery plans, but is it time for this to change? Forrester advocates that infrastructure & operations professionals move beyond just looking at disaster recovery as addressing large catastrophic events, but instead, looking to prevent any type of failure. Because ultimately, your customers don’t care if they can’t access their services because a website is too popular or the data center that hosts it is a pile of rubble — they care about accessing the service.
Is it time to include unexpected spikes in demand as a risk in your disaster recovery plans? If you consider your web presence a critical business service, I would argue, YES! Begin by defining your response plan for unpredicted or larger than predicted traffic spikes — will you redirect traffic to an alternate site? Or can you burst into additional capacity potentially at a cloud provider site? Include these scenarios during your testing as well, and consider including simulations of traffic spikes in DR exercises. It’s also critical to have a communication plan prepared for your (potentially) upset customers. More often than not, if you are experiencing increased traffic, it’s a highly critical time for the business in terms of revenue and reputation, so everything must be in place to bring your services back up as quickly as possible.
I’m curious about what you think — will you include unexpected spikes in demand in your DR plans? Is this a probable risk that you face?
[With contributions from Eric Chi]