Do you think you are ready to tackle Big Data because you are pushing the limits of your data Volume, Velocity, Variety and Variability? Take a deep breath (and maybe a cold shower) before you plunge full speed ahead into unchartered territories and murky waters of Big Data. Now that you are calm, cool and collected, ask yourself the following key questions:
- What’s the business use case? What are some of the business pain points, challenges and opportunities you are trying to address with Big Data? Are your business users coming to you with such requests or are you in the doomed-for-failure realm of technology looking for a solution?
- Are you sure it’s not just BI 101? Once you identify specific business requirements, ask whether Big Data is really the answer you are looking for. In the majority of my Big Data client inquiries, after a few probing questions I typically find out that it's really BI 101: data governance, data integration, data modeling and architecture, org structures, responsibilities, budgets, priorities, etc. Not Big Data.
- Why can’t your current environment handle it? Next comes another sanity check. If you are still thinking you are dealing with Big Data challenges, are you sure you need to do something different, technology-wise? Are you really sure your existing ETL/DW/BI/Advanced Analytics environment can't address the pain points in question? Would just adding another node, another server, more memory (if these are all within your acceptable budget ranges) do the trick?
- Are you looking for a different type of DBMS? Last, but not least. Do the answers to some of your business challenges lie in different types of databases (not necessarily Big Data) because relational or multidimensional DBMS models don’t support your business requirements (entity and attribute relationships are not relational)? Are you really looking to supplement RDBMS and MOLAP DBMS with hierarchical, object, XML, RDF (triple stores), graph, inverted index or associative DBMS?
Still think you need Big Data? Ok, let’s keep going. Which of the following two categories of Big Data use cases apply to you? Or is it both in your case?
- Category 1. Cost reduction, containment, avoidance. Are you trying to do what you already do in your existing ETL/DW/BI/Advanced Analytics environment but just much cheaper (and maybe faster), using OSS technology like Hadoop (Hadoop OSS and commercial ecosystem is very complex, we are currently working on a landscape – if you have a POV on what it should look like, drop me a note)?
- Category 2. Solving new problems. Are you trying to do something completely new, that you could not do at all before? Remember, all traditional ETL/DW/BI require a data model. Data models come from requirements. Requirements come from understanding of data and business processes. But in the world of Big Data you don’t know what’s out there until you look at it. We call this data exploration and discovery. It’s a step BEFORE requirements in the new world of Big Data.
Congratulations! Now you are really in the Big Data world. Problem solved? Not so fast. Even if you are convinced that are you need to solve new types of business problems with new technology, do you really know how to:
- Manage it?
- Secure it (compliance and risk officers and auditors hate Big Data!)?
- Govern it?
- Cleanse it?
- Persist it?
- Productionalize it?
- Assign roles and responsibilities?
You may find that all of your best DW, BI, MDM practices for SDLC, PMO and Governance aren’t directly applicable to or just don’t work for Big Data. This is where the real challenge of Big Data currently lies. I personally have not seen a good example of best practices around managing and governing Big Data. If you have one, I’d love to see it!