When ROLAP Is Not A ROLAP
I get many inquiries on the differences and pros and cons of MOLAP versus ROLAP architectures for analytics and BI. In the old days, the differences between MOLAP, DOLAP, HOLAP, and ROLAP were pretty clear. Today, given the modern scalability requirements, DOLAP has all but disappeared, and the lines between MOLAP, ROLAP, and HOLAP are getting murkier and murkier. Here are some of the reasons:
- Some RDBMSes (Oracle, DB2, Microsoft) offer built-in OLAP engines, often eliminating a need to have a separate OLAP engine in BI tools.
- Some of the DW-optimized DBMSes like Teradata, SybaseIQ, and Netezza partially eliminate the need for an OLAP engine with aggregate indexes, columnar architecture, or brute force table scans.
- MOLAP engines like Microsoft SSAS and Oracle Essbase can do drill-throughs to detailed transactions.
- Semantic layers like SAP BusinessObjects Universe have some OLAP-like functionality.
For multiple reasons, outlined in OLAP and BI self-service documents including the main one — drill anywhere — I feel that ROLAP is a superior approach to MOLAP. Many vendors support that notion and are rushing to invent ROLAP architecture, or, alas, in many cases “markitechture”. So, here’s my first attempt to identify a few ROLAP features that can clearly differentiate a true ROLAP engine from a fake. True ROLAP has to:
- Support multipass SQL. This is required to answer analytical questions that cannot be answered in a single pass of SQL.
- Minimize the amount of data movement between RDBMS and client app by:
- Storing (temp tables, derived tables) intermediate query results from multipass SQL and joining them in RDBMS.
- Pushing all calculations to RDBMS.
- Pushing all filtering and grouping to RDBMS.
- Pushing all filtering and grouping on calculations to RDBMS.
- Be aggregate-aware. That means that adding or dropping aggregate tables should have no effect on reports and SQL.
- Optimize SQL based on RDBMS type.
- Support heterogeneous joins to join data from multiple RDBMSes in a single report. Again, to minimize data movement, this should be done in RDBMS. Putting in a separate EII/data federation layer/tool does not count.
- Handle normalized and denormalized data models equally efficiently.
- Automatically resolve typical start/snowflake query errors and conflicts, such as trying to join multiple fact tables.
Before I publish a document on this subject, I’d like to collect everyone’s feedback on:
- Did I get it right?
- Am I missing any other key ROLAP differentiators?
- What current OLAP engines qualify as ROLAP given these criteria? MicroStrategy and OBIEE immediately come to mind. What about Mondrian? Others?