Big data is an ecosystem in which the open source approaches have the greatest momentum: the most widespread adoption and the most feverish innovation. Open source platforms are expanding their footprint in advanced analytics.

As the enterprise Hadoop market continues to mature and many companies deploy their clusters for the most demanding analytical challenges, data scientists will begin to migrate toward this new, open source-centric platform. At the same time, enterprise adoption of the open source R language will grow in 2012 and beyond, and we’ll see greater industry convergence between Hadoop and R, especially as analytics tool vendors integrate both technologies tightly into their offerings. We will also see increasing adoption of open source data integration tools, such as those commercialized by Talend and others, and of open source BI tools from Pentaho, Jaspersoft, and others.

This is happening for the following reasons:

  • Open source initiatives are transforming all platforms and tools. Open source infrastructure, platforms, tools, and applications — such as Linux, Apache, Eclipse, Python, Mozilla, and Android — have gained widespread adoption in many sectors of the IT world, due to advantages such as no-cost licensing, extensibility, and vibrant communities.
  • Open source communities are where the fresh action is. Open source communities have fostered innovative new approaches and ecosystems, increasingly getting a jump on the incumbent providers of proprietary, closed source — albeit feature-rich and robust — offerings in advanced analytics, data warehousing, and integration tools.
  • Open source solutions and providers are maturing rapidly. A new generation of IT professionals realizes it can now obtain open source data and analytics products from a growing range of vendors, both startups and incumbents, who offer out-of-the-box integration with customers’ legacy IT and also provide strong service, support, and consulting services. Open source data and analytics products are no longer the risky bet they were just a year or two ago.

Recognizing this trend, and seeing the speed at which incumbent vendors are incorporating open source technologies into their solutions, Forrester regards Hadoop, for example, as the nucleus of the next-generation enterprise data warehouse (EDW) in the cloud, and R as a key codebase in the coming wave of integrated big data development tools. We also expect various open source NoSQL databases and tools to coalesce into rich alternatives to closed source content analytics offerings.

As the footprint of closed source software shrinks in many data/analytics environments, many incumbent vendors will evolve their business models toward open source approaches and ramp up their professional services and systems integration to assist customers moving toward open source, cloud-oriented analytics, much of it focused on Hadoop and R. Furthermore, we’ll see a fair number of open source data/analytics tool, platform, and application vendors join forces through mergers and acquisitions.

Just as importantly, we expect a growing range of next-generation big data development tools to plug into extensible open source platforms geared to boosting the collective productivity of teams of data scientists and subject-matter experts. It’s with this last trend in mind that we laud EMC Greenplum’s recent announcementthat it is open-sourcing its new Chorus “social” framework for big data development.

As the platforms and tools open up, so will big data’s development ecosystem. Big data will leverage the most open arena of all, “crowdsourcing” cloud approaches such as Kaggle, to pool the world’s expertise (or at least that of all the smart people in your company and/or value chain) in wide-ranging development, investigation, and exploration of analytics- and data-infused business problems from all conceivable angles.