I often talk with IT leaders about using public cloud technologies such as infrastructure as a service, platform as a service, or hosted multi-tenant applications. Generally, I hear one of two different reactions:
If my business users’ requirements can be supported by public cloud based technologies, I am happy to get out of the data center business (i.e., if I can push those headaches to someone else, sign me up!)
The public cloud isn’t ready for my sensitive data or critical business applications and I don’t trust the cloud providers with my critical infrastructure (i.e., my data center and people are better than any outsourcer!)
Typically, when I hear the latter, I have to restrain from smiling – because it is often the IT leaders whose data centers lack process discipline, full redundancy, and deep staff expertise who hold onto the belief that if I can hug my server in my data center, then all will be OK. The quote from Dr. Claire Lewicki (Nicole Kidman) in Days of Thunder comes to mind, “control is an illusion, you infantile egomaniac.”
Recently, I had the opportunity to visit a data center run by one of the world’s largest cloud providers. During that tour, my bias that cloud providers run better data centers than nearly all enterprises was certainly validated. The over 500,000 square foot facility, which easily required hundreds of millions of dollars in investment, was impressive not only for its size, but for the details implemented by their engineers in order to ensure reliable services. I would challenge nearly any IT leader who claims their data center runs with the level of discipline and maturity that this data center does.
For example, this data center:
Receives 30 megawatts of power from its supplier on 138kVcircuits. By receiving its power at this high-voltage, the provider is responsible for running its own substations to step down the power and distribute it within its data center. This function essentially eliminates the utility’s substation as a point of failure for power supply. Only the most dramatic of utility power disruptions would impact the data center’s supply.
Has a dedicated, 24x7-staffed control center to manager power and environmental systems.
Has security certifications that include ISO/IEC 27007, SSAE16, HIPAA/HITECH, PCI, FISMA, and State, Federal, and International Privacy laws (including the EU Data Protection Directive)
Applies multiple physical security measures, including: perimeter gates & cameras, inside data center cameras which retain at least 3 months footage, man-traps and biometric readers, an active & enforced security system which requires employees to badge in and out of every room.
Has dedicated, marked cabinets to hold business critical or sensitive data. Moreover, the data center requires engineers to check out keys and security guards to confirm that the cabinet has been locked properly once an engineer has completed his/her work.
Utilizes 2N Core Networking with multiple, on-site spares – including components that approach $1m price tags sitting in the spares room.
Has the cleanest, most organized cable plant and cable management I have seen in any of the hundreds of data centers I have toured in my career.
Is made up of both traditional raised floor space as well as container-based data centers which allows for massive increases in capacity with simple connections into the power, network, and cooling capabilities.
One major difference in this data center compared to a traditional data center is how it is designed for reliability. Enterprise data center managers typically build-in redundancy at every component – redundant power supplies, disk drives, network connections, etc. Since cloud providers build their own software – they focus their reliability on resilient software rather than resilient hardware. All servers have a single power and network connection. Redundancy was provided in the software at the rack and data center level. The corresponding reduction in cables helps with cooling and sustainability by reducing the amount of cables utilized within the facility by half. This method of resiliency did validate one concern of the server-hugging IT manager – any individual server would be more likely to fail at this data center than at an enterprise data center. But, at the end of the day, I would rather focus on service and application availability – if the software can handle the failure – even if it includes the unlikely loss of an entire data center. With real software resiliency, failed individual components become unimportant. To illustrate, I saw more failed hard drive disk lights in this facility than any other – but since no service was impacted, their replacement could be delayed until a certain threshold had been met. Humans in the data center are the most likely cause of an outage, so building for failures such that human presence in the data center can be limited improves reliability.
In discussions with the data center managers, without a single incident last year, it was easy for them to calculate the downtime. By building a data center for five nines of availability, this provider is very comfortable committing to a money-backed 99.9% SLA for its cloud customers, which means the “push my headache to someone else” IT leader isn’t completely off the hook when they host in the cloud. If your business users require a higher level of availability, than a public cloud based infrastructure may not be appropriate, but you also have to ask if your own data center and associated processes can consistently delivers 99.9% availability.
Control is an illusion – by hosting in the public cloud IT leadership loses direct control of its infrastructure – but gains levels of reliability unavailable to all but the privileged few data center managers.