Federation Supplements The Data Warehouse – Not Either/Or, Never Was
Few enterprise data warehousing (EDW) professionals regard the key rival approach–data federation–to be a best practice. Usually, the reasons for this disdain are valid, such as the fact that federated environments are not optimized for heavy-hitting data matching, merging, transformation and cleansing, all of which are essential functions to deliver a "single version of the truth" for business intelligence (BI).
But data federation is refusing to die as an alternative to EDW–and is taking on new importance in organizations’ data management strategies. Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.
Nevertheless, they are complementary approaches, each with its respective pros and cons. For example, data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations. In practice, data federation and EDW (aka data consolidation) are not mutually exclusive. Many real-world data federation deployments are in fact hybrid approaches that involve EDWs to varying degrees. Federation environments can coexist with, extend, virtualize, and enrich EDWs to help users pull a wide range of disparate data into their reports, queries, dashboards, and analytic applications.
To determine whether an operational BI scenario requires a federated solution–in lieu of or supplementing an EDW-hubbed topology–Information and Knowledge Management (I&KM) professionals should determine whether their data management environment fits any or all of the following criteria:
- Multiplicity: Are there are multiple distributed data stores within any or all of the principal data-persistence tiers, including online transaction processing (OLTP) systems, EDWs, operational data stores (ODSs), and online analytical processing (OLAP) data marts?
- Heterogeneity: Does this distributed data implement a wide range of incompatible formats, schemas, syntaxes, models, glossaries, and vocabularies, including myriad structured, semi-structured, and unstructured formats?
- Autonomy: Is this data under the control, administration, and governance of a wide range of autonomous organizations, business units, and ownership domains?
- Opacity: Are there are security, privacy, and other sensitivity restrictions that prevent external visibility into this data and metadata, and/or restrict external domains’ ability to load, replicate, synchronize, and use it the data in their EDW, BI, and application environments?
- Inflexibility: Are there constraints of a technical, administrative, or policy nature that prevent the EDW–or other relevant data consolidation points–from expanding in capacity, taking on more near-real-time workloads, and otherwise expanding their functional or deployment role in your data management environment?
As decentralized service-oriented architectures (SOA) gain traction in operational BI environments, enterprise requirements for data federation–with or without an EDW in the loop–will continue to grow. Also, as EDWs begin to manage petabyte-scale data sets, batch transfer of this data will prove ever more costly and cumbersome–and federated query of this "too massive to move" data will become the only viable option.