The 10 Commandments of Business Intelligence in Big Data
- by 7wData
Organizations today don’t use previous generation architectures to store their big data. Why would they use previous-generation BI tools for big data analysis? When looking at BI tools for your organization, there are 10 “Commandments” you should live by.
Moving Big Data is expensive: it is big, after all, so physics is against you if you need to load it up and move it. Avoid extracting data out into data marts and cubes, because “extract” means moving, and creates big-data-sized problems in maintenance, network performance additional CPU — on two copies that are logically the same. Pushing BI down to the lower layers to run at the data is what motivated Big Data in the first place.
Security’s not optional. The sadly regular drumbeat of data breaches shows it’s not easy, either. Look for
BI tools that can leverage the security model that’s already in place. Big Data can make this easier, with unified security systems like Ranger, Sentry and Knox; even Mongo has an amazing security architecture now. All these models allow you to plug right in, propagate user information all the way up to the application layer, and enforce a visualization’s authorization and the data lineage associated with it along the way. Security as a service: use it.
One of the fundamental beauties of Big Data is that when done right, it can be extremely cost effective. Putting five petabytes of data into Oracle could break the bank; but you cando just that in a big data system. That said, there are certain price traps you should watch out for before you buy. Some BI applications charge users by the gigabyte, or by gigabyte indexed. Caveat emptor! It’s totally common to have geometric, exponential, logarithmic growth in data and in adoption with big data. Our customers have seen deployments grow from tens of billions of entries to hundreds of billions in a matter of months, with a user base up by 50x. That’s another beauty of big data systems: Incremental scalability. Make sure you don’t get lowballed into a BI tool that penalizes your upside.
Sharing static charts and graphs? We’ve all done it: Publishing PDFs, exporting to PNGs, email attachments, etc. But with big data and BI, static won’t cut it: All you have is pretty pictures. You should be able let anyone you want interact with your data. Think of visualizations as interactive roadmaps for navigating data; why should only one person take the journey? Publishing interactive visualizations is only the first step. Look ahead to the Github model. Rather than “Here’s your final published product,” get “Here is a Viz, make a clone, fork it, and this is how I derived at those insights, and see what other problem domains it applies to.” It lets others learn from your insights.
Too often, I hear people referring to big data as “unstructured.” It’s far more. Finance and sensors generate tons of key value pairs. JSON — probably the trendiest data format of all — can be semi-structured, multi-structured, etc. MongoDB has made a huge bet on making sure data should stay in this format: Beyond its virtues for performance and scalability reasons, expressiveness gets lost when you convert it into the rows and tables. And lots of big data is still created in tables, often with thousands of columns. And you’re going to have to do relational joins over all of it: “Select this from there when that…” Flattening can destroy critical relationships expressed in the original structure. Stay away from BI solutions that tell you “please transform your data into a pretty table because that’s the way we’ve always done it.”
In 2016 we expect things to be fast. One classic approach is OLAP cubes, essentially moving the data into a pre-computed cache, to get good performance.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More