This past week I had the opportunity to participate in a panel discussion at the MIT Sloan CIO Symposium in Cambridge, MA. The topic was “Big Data, Analytics and Insights.” The conversation revolved around the concepts that are forging the terminology and driving business value as well as the explosion of associated technologies.
One of the first questions focused in on the core of the terminology debate: If you were to rename Big Data, what would you call it? My immediate reaction was to rename it Unwieldy Data. Meaning, the database sizes exceed the traditional relational data store’s ability to ingest, process, back up and restore in a timely manner. You can have extremely complex algorithms that are processing high degrees of predictive analytics and machine learning on a small subset of data. Thus, when you talk about Big Data you should separate the actual storage mechanisms from the algorithms and processing used to derive the value. This has been the catalyst for technologies such as Apache’s Hadoop, Google’s Big Table and Facebook’s Cassandra. With a growing list of available data storage platforms, the industry is still years away from a dominant market leader.
One particular discussion that I found interesting centered on the evolution of the Data Scientist’s role inside organizations. A modern Data Scientist, or data science team, must understand big data technologies, complex business requirements that yield value, statistics/probabilities and machine learning. They must also have the ability to write code and possess a working knowledge of new visualization technologies. This is a tall order given the rate of change of technologies in these categories. It is also a very good reason to build a diversified team as quickly as possible. It takes time to find the right people who fit in a data science team structure.
Another topic that is central to all discussions of Big Data deals with how organizations value their data. How can the data increase competitive advantage? In order to promote Big Data to the position of a valuable asset, an organization’s culture must revolve around the data. This is the first requirement. This is not a new concept as businesses have been generating competitive advantage from their data for years. What's different is the environment of ubiquitous network devices, the generation of massive data sets, e-commerce driven by real-time analytics and the resulting complexities around storing, understanding, processing and displaying useful results of massive data. Because of this explosion of data, businesses are being forced to apply far more science to solving these problems than they have in the past few decades.
Out of the discussions at MIT, my overarching insight was that organizations have entered the next realm of Business Intelligence. Organizations have evolved past single database instances with complex cubes to a world being driven by new technologies that support the exponential amount of data being generated everyday. Organizations are learning that they must treat their data as an asset. The technologies are evolving quickly, with no clear market leader. Therefore, it is imperative to establish a vision and culture that is centered on the data. Invest in a team that understands storage options, business requirements, statistical processing engines and visualization technologies. Most importantly, stay committed to deriving value from your data for you and your customers.