Why Recovery Time Objectives (RTOs) Can Be Misleading
“Are you on the business side or the IT side?” was a question I received maybe a half dozen times last month while I was attending the Disaster Recovery Journal Fall World in San Diego. This question really got me thinking—everyone at the conference worked in business continuity (BC) and/or disaster recovery (DR), but there was a definite divide between those who reported into IT departments and those who reported into the business. For the most part, these divisions fell along the lines of those who reported into IT had a DR focus and those who reported into the business (or perhaps into security and risk) had a BC focus. Attending the different breakout sessions across both domains I noted the good news: both groups speak the same language: RTO, RPO, availability, downtime, resilience, etc. The bad news is that I’m not sure we’re all using the same dictionary.
Two of the business-focused sessions I attended pointed out a troubling difference in the way IT and the business interpret one of the simplest of BC/DR terms: RTO. What is RTO? Simply put, it is the time to recover a service after an outage. This seems straightforward enough, but let’s breaks out how a business and an IT professional might understand RTO:
- Business: The maximum amount of time that my service can be unavailable.
- IT: The amount of time it takes to recover that service.
The difference between these two interpretations is that IT does not normally account for the time is takes to declare a disaster. As soon as a service goes down, do you immediately declare a disaster? No, you would most likely spend some time trouble shooting it before deciding that a disaster needs to be declared. Even then, IT most often does not have the power to declare the disaster, the decision may be up to an executive, thus adding more time before a declaration occurs and IT can actually start the work of recovery.
The enlightened track presenters both suggested that BC/DR professionals separate out time to declaration from recovery time, to avoid miscommunication, missed SLA’s and bad blood between IT and the business. I think this is an interesting idea, so I’m curious to know if any of you out there are measuring RTO in a segmented way like that?