Data have been growing as result of more ways to intercept the external world (sensors of many types) and more data are becoming available as communications gets cheaper and pervasive and storage manages to handle PBs. Another important aspect is that the whole machinery of our daily life, and of biz, is now digital, thus thriving and generating more data as fall out of the ongoing processes.
There are many studies showing that the growth, in stored data, in transmitted data, in generated data, in consumed data, in inferred data, is exponential and such will remain for this decade and few years of the next one. However, we are reaching a point where saying that more data is available is meaningless. It is no longer about more rather about the fabric of data that is being generated. In a way, to use an image, we have seen data growing like threads, than we have seen interconnections raising among threads forming nets. Now we are starting to lose the sight of nets as the interconnection gets so tight that we are starting to see fabric and objects.
Representation of a social graph
Out of these objects, as represented in the figure, we are capturing new information that is more related to the characteristic of this virtual object than of any specific data set. If you want to expand this vision, think of data as the representation of cells in a living being. Once you get enough cells and their interconnections you can start losing sight of the individual cells and even of the organs they may form and start to look at the whole as an organisms and begin studying the behavior of the whole organisms, forgetting that it is composed of individual cells.
This sort of vision starts to be applied at cities, communities, power grids, enterprises and we qualify this vision of the whole and the actions that can be taken as “Smart”: Smart cities, Smart communities, Smart grids, Smart enterprises….
Looking at the whole of these big data requires new approaches to data management and data mining. and this is what is being discussed in this days in San Diego at the ACM Knowledge Discovery and Data Mining Conference.
As Technology Review points out, there is now a widespread interest in mining data, where once this was the domain of scientific domains. Today may commercial enterprises have a worldwide market counting millions of users and they feel they can provide much better services if they understand the data their users generate. Netflix has offered on million dollars to whom can provide them with a smarter analyses of their customers to provide them with better suggestions on what to watch next.
Social media tools, like Facebook, are ideally positioned to harvest user behavior data and to make inferences out of them. And so are Telecom Operatos.
Making inferences is a very powerful way to gain understanding but it is also potentially invasive of privacy. This is why Operators are so careful in playing this game. There is a need for regulation and accountability, for technology to understand what is going on and to opt out.
However, my feeling is that if we want to move from cities to smart cities, from communities to smart communities, from homes to smart homes and so on we need to leverage on data. And, of course, the challenge is to it “smartly”.