Not Your Grandfather’s Data Warehouse
As I dig into my initial research, it dawned on me – some technology trends are having an impact on information management/data warehouse (DW) architectures, and EAs should consider these when planning out their firm’s road map. The next thought I had – this wasn’t completely obvious when I began. The final thought? As the EA role analyst covering emerging technology and trends, this is the kind of material I need to be writing about.
Let me explain:
No. 1: Big Data expands the scope of DWs. A challenge with typical data management approaches is that they are not suited to dealing with data that is poorly structured, sparsely attributed, and high-volume. For example, today’s DW appliances boast abilities to handle up to a 100 TB of volume, but the data must be transformed into a highly structured format to be useful. Big Data technology applies the power of massively parallel distributed computing to capture and sift through data gone wild – that is, data at an extreme scale of volume, velocity, and variability. Big Data technology does not deliver insight, however – insights depend on analytics that result from combing the results of things like Hadoop MapReduce jobs with manageable “small data” already in your DW.
Even the notion of a DW is changing when we start to think “Big” – Apache just graduated Hivefrom being part of Hadoop to its own project (Hive is a DW framework for Big Data). If you have any doubt, read James Kobielus’ “The Forrester Wave™: Enterprise Data Warehousing Platforms, Q1 2011.”
No. 2: Enterprise data virtualization technology can improve your DW architecture. Three things that lead me to this: 1) As firms discover this mature technology, they realize data can be integrated without physical ELT in some cases – light bulb moment; 2) leading firms are evolving data virtualization point solutions into enterprise deployments that deliver broad benefits (see my colleague Noel Yuhanna’s prescient “Information Fabric: Enterprise Data Virtualization”); and 3) vendors are all scrambling to add Big Data integrations into their virtualization tool kits. The upshot of these statements is that future enterprise information architectures are likely to include malleable structures combining virtual and physical stores and connections to Big Data sets.
What does this mean for EAs? I think it means that “your grandfather’s DW” architecture may not work for the future. Specifically:
- You may need to put DW architecture refresh on your work plan and start collaborating with stakeholders to sketch out a new target state to capitalize on these trends. Especially if your DW is not a brand-spanking-new appliance that incorporates virtual data stores and Big Data already.
- There is hope if you can’t figure out how to economically deal with an environment that has multiple DWs or BI tools – enterprise deployments of data virtualization can overcome these challenges. In my second report, I’ll provide examples of how a large telecom and drug manufacturer dealt with just these problems.
Please let me know if you’d like to hear more about this and some additional research Forrester is doing in this area. My first report, “Big Opportunities In Big Data” will be out mid-month, and I’ll be talking about it at IT Forum. The second, “Data Virtualization Reaches Critical Mass,” is scheduled for June. Finally, Noel Yuhanna is updating the Information-As-A-Service Forrester Wave that covers data virtualization technology in the context of a broader strategy.