Boris-Evelson By Boris Evelson

I just came back from an exciting week in Orlando, FL, shuttling between SAP SAPPHIRE and IBM Cognos Forum conferences. Thank you, my friends at SAP and IBM for putting the two conferences right next to each other (time- and location-wise), and for saving me an extra trip!

Both conferences showed new and exciting products and both vendors are making great progress towards my vision of “next generation BI”: automated, pervasive, unified and limitless.  I track about 20 different trends under these four categories, but there’s a particular one that is especially catching my attention these days. It went largely under covers at both conferences, and I was struggling with how to verbalize it, until my good friend and peer, Mark Albala, of, put it in excellent terms for me in an email earlier today: it’s all about “pre-discovery” vs. “post-discovery” of data.

We can debate endlessly the pros and cons of traditional row oriented RDBMS vs. newer DBMS architectures specifically designed for BI and OLAP (like columnar – Sybase IQ, Vertica, Paraccel,  inverted index – Microsoft FAST Search, Endeca, Attivio, tokenized – illuminate, and in-memory analytical DBMS – TIBCO Spotfire, QlikTech) – and I had lots of fun doing that on the DM Radio show last Thursday!  One thing, however, remains undisputed: traditional RDBMS and OLAP architectures require pre-discovery of data, aka data integration and data modeling. No matter how much flexibility and richness we think we built into out relational or multidimensional data models, they are still only as good as our initial design. If we did not anticipate the types of questions that would be asked of our application in the future, no fixed relational or multidimensional data models will be able to help us.

But the world moves way too fast on us. For example, the methodology behind economic capital calculation in the financial services industry, according to Basel II requirements, may change on a weekly, sometimes even on a daily, basis due to regulatory and competitive pressures. No traditional data models and BI tools can keep up with such furiously quickly changing requirements. As a result, one of our recent surveys found that more than half of the respondents did not have most of the information they were looking for in their BI applications, and close to two thirds relied on IT for new BI requests.

What’s the answer? There are many, but one partial answer is post-discovery, rather than pre-discovery of data. For example, an inverted index DBMS from Attivio, or a tokenized data store from illuminate, or in-memory models from TIBCO Spotfire and QlikTech just need you to index or loading data "as is", not really requiring any modeling up front. And because all of these technologies can indeed cross reference every attribute with every other attribute (it's an index!), a virtual data model is created on the fly simply by virtue of asking it a question. Gone are the days of having to analyze your requirements, document your requests, work with IT to make it happen – a process that often takes weeks or months.

Sounds good? It does, but this is obviously not a BI panacea. Yet. I do not think we will see a mass conversion to these analytical engines that allow for post-discovery of information and data models in the short term. Why?

  • Many challenges still remain with these technologies such as lack of operational BI (you typically need to reload entire model or rebuild entire index to make updates), administration (partitioning, and modular backup and restore are not easy tasks with index DBMS today), and other mission critical, production features of large enterprise DBMS.
  • All these new technologies still rely on someone else doing all of the upfront leg work to integrate, reconcile and cleanse the data. There is no magical work around the hard work of planning, designing and implementing data quality and master data management processes and applications.

How’s all this related to SAP and IBM, you ask? Simple.

  • SAP is heavily promoting its guided searchengine – Explorer – previously known as Polestar. It’s a similar index to Endeca and Microsoft FAST Search, capable of allowing BI users discovering answers to previously unplanned questions. Unlike Attivio, though, Explorer still requires an underlying data model, either in the form of Business Objects Universe or SAP BW, but because under the covers it uses the SAP TRex index engine, the right technology is there, and it’s a huge step in the right direction for SAP.
  • IBM is also leading the market here. While its Cognos GoSearch product allows for some guided exploration too, it’s the acquisition of Exeros a couple of weeks ago that gives IBM unique capabilities (with some competition from Composite Software and Sypherlink) to post-discoverdata, data relationships and continuously update data models and data content based on the newly discovered information. I can't wait till IBM comes up with a fully automated way to update the data model, Cognos Framework Manager, and Cognos reports and dashboards based on a source system change!

This is yet another proof – and I’ll never get tired of saying this – that BI market is as vibrant, exciting and far from commoditization as ever!