One of the reasons for only a portion of enterprise and external (about a third of structured and a quarter of unstructured -) data being available for insights is a restrictive architecture of SQL databases. In SQL databases data and metadata (data models, aka schemas) are tightly bound and inseparable (aka early binding, schema on write). Changing the model often requires at best just rebuilding an index or an aggregate, at worst – reloading entire columns and tables. Therefore many analysts start their work from data sets based on these tightly bound models, where DBAs and data architects have already built business requirements (that may be outdated or incomplete) into the models. Thus the data delivered to the end-users already contains inherent biases, which are opaque to the user and can strongly influence their analysis. As part of the natural evolution of Business Intelligence (BI) platforms data exploration now addresses this challenge. How? BI pros can now take advantage of ALL raw data available in their enterprises by:
- Supplementing structured querying and analysis with on demand data exploration. Data exploration supports on demand data models or loose coupling of data and metadata (aka late binding, schema on read). An analyst can build the model from a raw data set and analyze it at the same time. No need to wait for a DBA to create that model.
- Taking advantage of in-memory data exploration. BI tools based on in-memory data architecture such as Microsoft PowerPivot/PowerViewer, Qlik (QlikView and QlikSense products), Tableau Software, TIBCO Spotfire are basically spreadsheets on steroids. As long as you have a data set with rows and columns, just load it in memory and start exploring. One exploration use case could be just simply searching for data (not something that SQL databases easily support). And when you found what you are looking for, just tag data columns as measures or attributes, create virtual on demand cubes or pivot tables. Then the user can go into analysis – slicing and dicing – mode. This is one of the main reasons behind Tableau acquisition of HyPer. The other vendors in the space,specifically Qlik and TIBCO Spotfire, had an advantage over Tableau having invented and perfected in-memory architecture years earlier. Yes, a couple of years ago Tableau did release an in-memory data blending feature but it was mostly based on flat in-memory tables and was not as efficient as the competitors’ offerings. Now that gap is indeed closed.
- Graduating to big data NoSQL data exploration. But take note! The above example works fine on relatively small data sets (usually under a terabyte). For data exploration on large data sets that will not fit into memory – multi terabyte and more – BI pros will need to look elsewhere by graduating to big data stores like HDFS and/or NoSQL databases. Good news is that they also have plenty of choices. There are two popular options. One is to explore data staged in HDFS directly, using native Hadoop BI platforms like Datameer, Platfora, TreasureData and Zoomdata. Another one – which works best when you need to explore data staged across your entire enterprise – not just in HDFS but also in all NoSQL and SQL databases – is to index the data. Leading vendors in this segment Attivio, IBM Watson Explorer and Oracle’s Big Data Discovery make data exploration even easier by implementing a search-like UI which allows users to quickly discover high-level patterns that they can then drill down into using more traditional SQL-like queries.