Think A Data Lake Is THE Answer? Think Again. Here Comes Elastic Analytics
Enterprise architects, are you mired in a tangled web of data marts while your business pursues customer engagement without you? If you think a Hadoop-centric architecture is going to save the day, you may need to rethink. Your customers expect you to create systems of insight to deliver win-win engagement in real time. I’m seeing a new class of digital predators leverage the cloud to do just this. For example, Netflix designs cover graphics for its series based on subscriber viewing habits. They know their customers that well.
I call their technology approach an Elastic Analytics Platform in my recently published report. I formally define it as:
“A combination of data storage and middleware technology that allows the creation and dissolution of analytics components on demand, while provisioning these with data from one, or a few, distributed, virtualized data sources.”
That’s a mouthful. So here’s a rough picture:
Firms like Netflix, Stitch Fix (who? read the linked KDnuggets blog post), and LinkedIn are sourcing all their data, and I mean everything, into a few data stores in the cloud. Next, they are exploiting cloud to create analytic workloads on demand. This gives them elasticity two ways. First, they get scale-out storage; second, they get on-demand analytics components. For example, Netflix can spin up Hadoop, Spark, or Kafka clusters as they need them and provision these from Kafka or S3. They also have Teradata on Amazon. This gives them enormous flexibility to create as much of what they need when they need it.
Where is the beef? It is in the middleware they are developing using a bunch of open source tools like Genie and Kragle. Their architecture features a data pipeline to S3 and Kafka, then it handles the creation of EMR, Hadoop, or Spark or vendor specific analytics workloads on demand. While I was at Strata, I listened to Kurt Brown from Netflix talk about all the things this approach lets them do. It’s mind-blowing. Read this article if you really want something to make you go, “Seriously?”
Don’t think this is something only digital upstarts can do, either. You don’t get off the hook that easily. HP is working with its hardware division to let enterprises do similar things soon. Plenty of innovative young vendors are jumping on this as well. For example, Snowflake is using this architecture to create “virtual data warehouses” on demand in the cloud.
What it means for enterprise architects is this: Big data predictive analytics architectures are changing beyond just data lakes. The Elastic Analytics Platform will revolutionize data science and predictive analytics. Why? Because it will let you use all your data, streaming or in batch, while keeping things both affordable and flexible. The sticky problem right now is the middleware needed to glue the storage and analytics components together. That has a long way to go, but it’s the subject of efforts like project Myriad. Expect a lot of progress over the next few years.