Login | Follow CITO Research:
Problem Statement
Preparing for Big Data
Preparing to take advantage of the business value that big data can provide is going to be a multi-year process that takes place in several stages. Most of the new sources of data arriving under the banner of big data are fundamentally different than the type of data stored in most data warehouses.
A new infrastructure will have to be constructed that stores data differently and allows for different sorts of distillation of the data into a form that allows further analysis. In addition, new tools will be needed to search the data, understand it, and perform analysis on it.
This problem statement examines the nature of that transition from all relevant dimensions. We will look into the way that IT architecture and infrastructure will change, examine the relevant technology for storage and analysis, and look into the way that big data will be analyzed in real time. In addition, we will examine the organizational and change management issues that are likely to appear. (See Forbes.com: "Big Data Requires a Big, New Architecture" and "Kill Your Data Warehouse" for an overview of some of the architectural issues.)
Context and Background
Most companies have a web site or other systems that have detailed information about consumer behavior. Web server logs, logs from networking and telecommunications equipment, data from sensors of various sorts, data from e-commerce or other transactional systems all can tell a story of some sort about what his happening with customer, partners, or key business processes.
The question is: What good, if any, is such information?
So far, the news about the value of big data is promising. Large e-commerce web sites are able to use big data from web server logs to better understand what consumers are doing. This allows adjustments to be made to navigation and product offerings to take advantage of discoveries about consumer preferences. Telecom companies are using distilled call detail records to respond faster to fraudulent use of the network. In operational environments, big data can be used to get a much more detailed picture of the state of an environment like a factory or a refinery. This more detailed picture provides early warnings of trouble and can help optimize maintenance processes. The analysis of social media data has proven to provide detailed indications of consumer sentiment before it appears in consumer buying behavior and other forms of behavior.
It is clear that there is value in big data, but the path to finding new insights that are relevant to a specific business is less clear. This problem statement should end up creating a roadmap that a CITO can follow to understand the opportunity that big data provides and how to build the infrastructure to support it.
Research Goal
Our goal is to provide guidance for CITOs to accelerate the process of preparing for Big Data. We want to show how a CITO can locate sources of big data and understand their business value. We want to help CITOs improve their ability to understand which answers to which questions would help improve the design and execution of key business processes. We want o help CITOs evaluate technology to find the tools and systems that are right for them. We want to develop a maturity model and checklists for making use of big data.
Questions
How is big data different?
Most of the time, big data means not only volumes of data but also data of a different structure. The most common form of big data comes from the logs of servers, network equipment, telecom equipment, and sensors of various types. While some of this data is structured like flat files with rows with consistent columns, much of the data has a variable structure that must be parsed in some way. Even when the structure is consistent and delimited, not all the values may be useful.
What is the potential business value of big data?
Big data can be used in a multitude of ways. High Resolution Management is the idea of shining light on the detailed workings of processes so that the way those processes are designed and managed can be changed to take advantage of more information. Big data can help identify events that can be early warnings of problems or indicators of opportunities to optimize a process. The improved model that big data can help create can be the foundation for better predictions. When big data is examined in real time it can form the basis for Operational Intelligence systems.
How can CITOs evaluate the potential of big data to improve execution of the most important business processes at their organizations?
Big data presents a challenge to CITOs that can be met by reversing the paradigm of User Driven Innovation, which recommends providing tools to build solutions to end-users who have the knowledge of the requirements. This approach avoids the transfer of "sticky knowledge" about requirements and usually sparks the creation of innovative solutions. With big data, this model must be adapted. End-users will find it difficult to use the tools to explore big data to determine its relevance. The data is too difficult to understand by end-users in its raw form. The tools, like Splunk or ThingWorx SQUEAL™ for exploring big data are too difficult for end-users. The only solution to this problem is for the CITO to lead the effort to understand the business processes. The CITO must find out what information would be valuable? Which answers to which questions could lead to better decisions or to the potential to redesign processes? Which information would lead to the ability to increase power in negotiations or to make better offers to customers. With this information in hand the CITO has an agenda to use to evaluate the potential business value of any new data source. Once a potential valuable piece of information is found, an offer to us that information can be made to the business?
How can big data be stored and managed?
Some big data will be best stored in files at first. But once the data is understood, how can the valuable fields be extracted for use? Should there be several stages for big data. First being stored in a file system. Then being stored in a graphical database or a database with a flexible structure like MarkLogic? Eventually, it is likely that the distilled and extracted big data will find its way into traditional data warehouses. In this sense, the process of big data is another form of ETL.
In what different ways could a Data Lake be implemented?
For example, one method could consist of a simple file system. Graphical or XML databases could also be used. The data lake would likely need to have the ability to story many different types of data and to store the queries that were used to understand the structure of the data. For example, in Splunk, it is possible to store a query that is used to explore and identify fields in a dataset. In this way, each new user can build on the understanding developed so far. Once a data set was understood, the relevant fields could be extracted as new data arrived and stored in a database that had more structure.
What tools allow big data to be searched and understood? How can big data be distilled to prepare for analysis?
Splunk provides a flexible and simple way to sift through mounds of big data stored in file systems. Once that data is understood, the extracted fields could be stored in relational databses, graphical databases, XML databases, depending on the way the data will be used. Once you know what you are looking, high speed sifting of massive volumes of data can be accomplished by using highly parallel programs created with technology like Pervasive's Data Rush.
How can the transition to big data occur in incremental steps? What would the stages in a big data maturity model look like?
The first in any maturity model would be to increase awareness of available big data sources, to understand their value, and to be on the lookout for new sources. Then having a systematic way of understanding what information would be useful to the business would be vital. With this information in hand, a process of evaluating and using big data could be put into place. Several stages of maturity for both developing processes, acquiring tools, and making use of the data would likely emerge from this analysis.
What tools allow big data to be analyzed in real time?
Complex event processing systems offer the ability to monitor streams of data to identify events. Systems like Splunk, Hadoop, and Data Rush can be used for the same purpose. A variety of operational intelligence vendors like Vitria offer technology made for monitoring streams of data to find events that can then be used to discover deeper patterns.
Related Content
- Login or register to post comments
Subscribe
Email This
Print This
