Apache Spark’s Marriage To Hadoop Will Be Bigger Than Kim And Kanye

Mike Gualtieri, VP, Principal Analyst

Feb 15 2015

Apache Spark is an open source cluster computing platform designed to process big data as efficiently as possible. Sound familiar? That's what Hadoop is designed to do. However, these are distinctly different, but complementary, platforms. Hadoop is designed to process large volumes of data that lives in an Hadoop distributed file system (HDFS). Spark is also designed to process large volumes of data, but much more efficiently than MapReduce, in part, by caching data in-memory. But, to say that Spark is just an in-memory data processing platform is a gross oversimplification and a common misconception. It also has a unique development framework that simplifies the development and efficiency of data processing jobs. You'll often hear Hadoop and Spark mentioned in the same breath. That's because, although they are independent platforms in their own right, they have an evolving, symbiotic relationship. Application development and delivery professionals (AD&D) must understand the key differences and synergies between this next-generation cluster-computing power couple to make informed decisions about their big data strategy and investments. Forrester clients can read the full report explaining the difference and synergies here: Apache Spark Is Powerful And Promising

Spark And Hadoop – A Marriage Of Celebrities

Don't believe the technology charlatans that tell you Spark will replace Hadoop. That's poppycock, for now. Today, Spark and Hadoop are meant to be together. The last thing an enterprise needs is yet another cluster to manage. Hadoop and Spark coexist on the same cluster to provide that killer big data combination of volume and speed for data processing. Like many marriages, some habits will have to be negotiated. Many data processing jobs that were originally written in Hadoop MapReduce will be rewritten for Spark. In the near future, Spark will become the primary API against which data stored in HDFS will be processed.

Will It Last?

Many marriages don’t, especially celebrity ones. There are two possibilities that could cause an irretrievable breakdown between Hadoop and Spark:

The Spark community builds its own Hadoop-less ecosystem. Remember thatSpark does not require Hadoop to run. The Spark community, led by Databricks the commercial company formed by the founders of Spark could develop and push its own filesystem and other technologies that make it an independent ecosystem.
The Hadoop community creates its own Spark-like features. The is no reason that the Hadoop open source community and formidable commercial vendors such as Hortonworks, Cloudera, and MapR could not develop technology that competes with the Spark benefits described in this report.

Judging from the lofty amounts of venture capital that have been invested in big data technologies, the profit motive certainly exists for both of these possibilities to happen. Stay tuned. In the meantime, enterprises can’t wait. It will be full steam ahead for both Hadoop and Spark.

Get The Insights At Work Newsletter

Country*

Yes, I’d like to receive Forrester’s Insights At Work newsletter and receive occasional survey invitations and marketing communications.

Thanks for signing up.

Stay tuned for updates from the Forrester blogs.

Categories

See Mike Gualtieri at:

Get The Insights At Work Newsletter

Thanks for signing up.

Summer Team Up: Technology & Innovation Summit EMEA

From 1 July to 29 August, purchase passes for Technology & Innovation Summit EMEA and get two tickets for the price of one with the voucher code SUMMER2FOR1.

AppGen Is Here: Say Goodbye To Software Development As You Know It

President Trump Amends Previous Cybersecurity Executive Orders: Here Is What You Need To Know

Get The Insights At Work Newsletter

Thanks for signing up.