The business has an insatiable appetite for data and insights.  Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data.  If access is achieved, the next step is “wrangling” the data into a usable data set for analysis.  The term “wrangling” itself creates a nervous twitch, unless you enjoy the rodeo.  But, the goal of the business isn’t to be an adrenalin junky.  The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions.  Those that get this have introduced a softer term, “blending.”  Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.

The reality is that you can’t market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data’s first principle of a schema-less data.)  Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist – working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn’t solve the problem of familiarizing the analyst with the data.  Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.

Companies like Paxata saw this problem and set out to eliminate it, not with a backend data integration and data management approach, but with a front-office data preparation tool that connects subject matter experts intimately with their data.  The point of data preparation tools is three-fold:

  1. Embracing schema on read defined by the business, not IT.  Big data creates big exploration and makes enterprise data models obsolete.  IT can’t anticipate, define, and build data models that keep pace with the infinite queries, analytic iterations, and business changes that affect the creation of data sets for analysis.  Schmas has to be created by the business and analysts and connected to what they want to achieve with the data.  Data preparation tools enable this by using machine learning and artificial intelligence to define schemas across aggregated data sources and provide a spreadsheet like environment where data professionals can quickly and easily refine the data to the intended use.
  2. Data stewardship becomes part of doing business.  The idea that data scientists would have to succumb to data governance activities was a big data killer out of the gate.  In fact, when you look at data governance for Hadoop today the most mature aspect is security, not the quality and consistency of the data.  However, that isn’t to say that quality and consistency didn’t matter.  Analysts, and data scientists in particular, work out data bugs as they prepare data sets.  Data preparation tools recognized the data citizenship occurring and delivered a better platform that further empowers data stewardship actions but aligns with how analysts think and interact with the data.  This keeps data aligned with the semantics of business language and nomenclature, not data systems.
  3. Transparency and collaboration catapult big data operational systems.  If data sets are built to create real time fraud detection systems, next best action for customer engagement, or optimization of manufacturing processes on plant floors, how data is prepared can’t happen in a vacuum.  The old way of migrating from analytic to operations was a lengthy process of business subject matter experts and analysts sitting in lengthy interview sessions with IT business analysts and enterprise architects to define requirements.  Data preparation tools cut this process down by capturing data preparation steps that IT can take and translate into a production environment.  Even as analysts and business stakeholders optimize analytic models and include additional data, IT still has access to these changes and can adapt systems more easily to keep pace with changes.

It may be sexier to think of data preparation tools as big data analytic solutions.  Yet, that would be missing the complete value and relevancy these tools have in the bigger picture of getting control and competency with big data for more than data science activities and limited operational implementations.  Data preparation tools are the catalyst to bringing trust, speed, and actionable insight for all data where traditional data governance and management tools have hit the wall.

Check out Forrester’s report on data preparation tools and find out how three data professional roles will be transformed by data preparation tools.