Sunday, January 30, 2011

What is BigData?

At some point quantity becomes quality. You can add a few more of something and suddenly it can become something different. For data we have crossed that threshold again over the last decade. Now whether that point is defined as terabytes or petabytes is in-itself relatively unimportant. What is important is that people have had to manage this qualitatively different amount of data, and in order to do so they have created new technologies, new techniques, and new ways of thinking about and analyzing the data. This in turn, of course, has created new opportunities. This new paradigm is known as BigData.

Practically when people speak about it, I believe they are referring to two specific things: 1. the new set of data management and processing technologies that involve either distributed or parallel computing; 2. an exploratory Business Intelligence activity that utilizes machine learning and statistics to extract knowledge from the data.