CITO Research managing editor Deb Cameron pointed out today a data governance problem that will appear quickly once the model described in today's article in Forbes.com, Box Partners With Roambi To Attack The BI Market. One of the massive downsides to spreadsheet sprawl is that you really have a severe data governance problem.
She put in this way:
"The question that comes to mind is how they ensure data governance across the spreadsheets. How do they ensure that the numbers in the spreadsheets are comparable? For example, Sales Manager East Coast labels a column total revenue, but he means something different from VP of Sales, who also labels a column as total revenue. An easy example, but you see what I mean. I worry about this in general with visualization tools, but I would hope that Roambi has a way to let someone vet the data to make sure that data is rolling up properly."
In my view Deb is right that this is a problem, but it is a problem right now. But, if the Box/Roambi partnership works, it will be a bigger problem and will need a solution. As Bill Gates pointed out: "The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency."
So, in this context, if you have a data governance problem now, if you get more people using spreadsheets that are not governed, you will have a bigger problem later.
But you will also have the data that is obviously good enough to make many people happy in the hands of many more people. Will it be a net win? Impossible to say in general but I suspect that a significant benefit will accrue, even if some contradictions are found. Remember, IT and those responsible for data governance, if they exist, have a much better chance of focusing on important spreadsheets because they can see through Box and Roambi which are being used the most.
This challenge brings to mind QlikView's Expressor product, which takes a bottom up view of data governance. The idea of Expressor is that you should watch what people are doing and gradually implement changes to get to a comprehensive and consistent semantic model.
Companies like Blackline, Adaptive Insights, or Tidemark create designed and curated semantic models for various purposes in a top down fashion, and then use those models to power processes for financial close, planning, and other thorny problems. In this approach, the model is designed and improved over time for important domains.
One of the most interesting developments to me is the idea of automated semantic modeling of the sort we see happening in ClearStory Data and DataRPM. These companies seek to use a machine learning-assisted approach to creating models so that modeling is much less of a bottleneck that it is now.
All of these approaches can work well in the right context, but each of them, as Deb points out, may create problems by amplifying one sort of inefficiency or another.
It seems that the best results will come from understanding where each approach will shine. I can imagine a world in which both spreadsheets and automated semantic modeling are used to allow access to data and the rapid creation of dashboards. The lessons of what people find important can then guide the expansion of highly curated centralized models like the ones we find in the data warehouse, or the other sorts of designed and curated models we mentioned earlier.