Before jumping into the case study, I wanted to highlight an announcement that was overshadowed by the press surrounding the story: Verizon Compute Cloud & Verizon Storage Cloud. Verizon made a signifcant announcement regarding its new public cloud solution that veered away from its original "enterprise cloud" messaging and towards a commodity based approach. With this approach Verizon looks to compete more directly with the likes of Amazon Web Services (AWS) by providing the same low cost for baseline products but with higher levels of performance. Rackspace recently announced its Rackspace Cloud Servers product with this same goal, although this was likely motivated by CloudSpectator's report published earlier in 2013. Rackspace used this opportunity to step up to the plate. Performance is a rising complex issue that makes "Let's just move it to the cloud" beyond an overly simplified statement. With that said, here's the overview from what I've seen this far: 

On March 23, 2010, the Patient Protection and Affordable Care Act (PPACT, or commonly known as Obamacare) was signed into law which sparked the creation of a government run website to provide Healthcare information to the public from the Department of  Health and Human Services (HHS) and the Centers for Medicare and Medicaid (CMS). Early on, was used to provide Medicare and Medicaid information to the public. No compliance-subject information was included within this initial roll-out. During its latest roll-out, the website's purpose evolved to also include: Obamacare qualification information, the ability to sign up, and health provider information and options. Sign-up and account creation would require personally identifying information (PII). CMS ultimately selected Terremark Enterprise Cloud (along with its managed services) to host this website in April 2011, which was soon after acquired by Verizon. In addition to Verizon, CMS hired a number of contractors to create and test this website including: CGI Federal Inc. (FFE IT and, Quality Software Services Inc. (QSSI) (data hub), Booz Allen Hamilton (enrollment and eligibility planning and state grant technical assistance), National Government Services Inc (NGS) (consumer call center and Small Business Health Options Program (SHOP) premium aggregations), The Mitre Corporation (project management and Information technology security), Logistics Management Institute (health plan management, rate analysis, and benefit package review), DEDE Inc DBA Genova Technology (IT), Terremark Federal Group (Cloud computing services), IDL Solutions (enterprise data and design support), and Navigant Consulting Inc (outreach and collection). On October 1, 2013 the site went live after many assurances of a successful launch, however, this was not to be. Issues included: 

Process flaws. At the forefront of the issue: Process flaws. Both McKinsey & Co. and the Government Services Agency (GSA)'s Martha Johnson cover this issue well. McKinsey was brought in by CMS to complete an assessment of the project which is released in March, which was then leaked to the press on November 18, 2013. This assessment highlights a number of process issues that would likely hinder the smoothe roll-out of the website. Notable items from this list include: No single empowered decision-making authority, lack of shared definition of success, limited end to end testing, Data Hub with a single point of failure, and design changes with less than 180 days to roll-out. Despite these warnings few of these recommendations sparked change before the roll-out. The GSA started creating government sites targeting at connecting critical information to the public back in 2009 with its and websites. Martha Johnson spoke out and published a blog regarding the true process flaws. She states quite accurately "What we are missing is strategy. IT needs to be a derivative of strategy, not of whiplash politics. Good, high performing IT requires investing, designing, building, and nurturing. It needs to fit into larger goals and be positioned to create synergies (i.e. additional energy and capability) from which the next round of effort can benefit. If were derived from strategy, we would be in a very different place. The project would have balanced better the problem of schedule vs. quality, for example. It would have included stakeholders more naturally and they would have been invested in, rather than surprised by, the roll-out. It would have emerged from a longer history of innovation and transparency, affording it more creative approaches for meshing databases."

Code quality. A number of sites have brought up issues with code quality. A small CGI competitor highlighted some concerns, along with Reddit users, a series of independent analysis, an overview of documented application performance issues, and a number of others have commented on the source code which has since been unpublished on GitHub. The New York Times also asserts that it has over 500 million lines of code, and likely containing a significant amount of legacy code. 

Design flaws. At the front of the website is a required login. In fact, the public can't gain any additional information on providers or whether they qualify without creating an account. Websites that also have this format experienced growing popularity that allowed them to roll-out users far more slowly than the nation-wide Oct. 1 sign-up. This created a funnel that immediately affected the quality of user experience. This was the major design flaw but others have also been covered

Flawed testing. Prior to Oct 1, the site was tested against 50,000 to 60,000 simultaneous users users. However, stress tests from days before the launch indicate slowed performance with only 1,100 users. The original estimations of 50,000 to 60,000 concurrent users was drastically underestimated, with over 8.1 million visting the website between October 1 and 4 with roughly 250,000 simultaneous users. 

Two data center outages. On October 29, 2013 and October 31, 2013, Verizon experienced two outages during this time that affected the site along with the other state healthcare websites that connect into the same data hub. Ironically, this was published right before these outages. 

– No disaster recovery. The site runs in a single data center and CMS elected not to purchase DR resources (which could quickly escalate prices further). 

Pirated code. The Weekly Standard covered the story on's use of pirated code. Apparently some of the code (DataTables) used for the website was not credit its creator (SpryMedia), which is the sole requirement for its use. 

– Security concerns. MITRE Corp. completed security tests prior to launch. During these tests, they corrected high-risk items noted, and received a full go-ahead from MITRE before launch. There were several significant security concerns: 1) Compliance-subject data. During the initial launch of the website, contained no compliance-subject information. However in the Oct. 1 release, account creation required personal identifying information (PII). It did not include data subject to HIPAA compliance (patient information) since healthcare providers can not deny coverage based on prexisting conditions. As such, the site does not ask or include any of this information. 2) Site errors. Reports say that if an "@" was included at the end of login, account information for other users was revealed. This error has since been corrected. 3) Subject to hackers. David Kennedy, self-proclaimed hacker, says that the website is easy pickings for hackers (specifics here). 4) Wifi breaches. Mark Lantermann from Computer Forensic Services noted that the site is subject to simple wifi breaches that can expose user names and passwords. 

On November19, 2013, a congressional hearing was held to officially cover security concerns of, but a large portion of this session also covers the performance challenges and miscommunications that may have occurred within CMS that led up to Oct. 1. During this session there's a significant amount of focus on the McKinsey report completed along with the limited amount of testing. This hearing primarily interviewed Henry Chao, who is the CTO of CMS but it also included expert interviews.


Other top stories to follow within this breakdown: 

Enrollment numbers pick up stem in December. On November 13, BBC reported 27,000 Americans enrolled, by November 30th 137,000 enrolled, 365,000 by December 11, and today the estimate is over 1 million enrolled

HP taking over contract. HP won out/outbid other providers earlier this summer to replace Verizon. will not be using HP Cloud. It will be using HP Managed Services. There's also note that HP won a contract in September to run a disaster recovery site for the website, which at this point has been a key missing component. 

– Retired Microsoft executive hired to take over site. Kurt Deblene was hired to run the website. Kathleen Sebelius said that she and President Obama felt strongly about having a single person responsible for the site. Kathleen stated that Deblene will serve this role at least until June. 

Google, Red Hat, and Oracle developers engaged to solve issues. Google has been brought in for strategic consulting especially around networking architecture whereas Red Hat and Oracle have been tapped to optimize scalability and reliability. 

Verizon throws more servers at the problem. After experiencing the outages, Verizon added additional servers to the environment for expanded capacity (for $9.4 million).