Big Data: knowledge in an information overload era


Information overload is a well-known concept that constantly keeps cropping up in the last few years: we are bombarded by far too much information, our poor brains are swamped, and our ability to figure-out the world is becoming increasingly more difficult. But even though this chorus of discontent considers only the negative side of the situation, one thing is certain: the amount of data is growing at a rate never seen before.

In 2007 alone, more information was generated than in the last five thousand years. In 2010 the zettabyte limit – one thousand billion gigabytes, if that can give you a better idea of its sheer size – was surpassed. And it continues to increase unabated.

So, where does all this data come from? In first place, and the vast majority of it, is produced by us.

Everything that was once the oral domain is now guarded by the web: the enormous flow of conversations, photos, tweets, music, videos, posts and whatever else that we are able to upload is stored in the form of bits and is easy – sometimes too easily – downloadable.

Faced with such a deluge of information, a new analysis approach is required. And this is where the term Big Data– maybe not the most original of labels – sums it up perfectly: a quantity of data so large and so diverse, that it requires different forms of reading to derive any real value. Basically, we’re talking about cloud computing platforms and ever more powerful algorithms: hi-tech means that aren’t limited by information overload boundaries – a psychological concept – and that can process enormous volumes of info that would otherwise be absolute chaos to us.

Big Data was one of the main topics on the agenda at this year’s World Economic Forum. And – as Steve Lohr from The New York Times stated some days ago – they talked about it in terms of a real and true commodity, like money or gold.

In fact, thanks to Big Data it’s possible to do a lot of great things, going way beyond just simple marketing.

For example, here are a couple of examples that came out of the Forum briefing: researchers can predict the spread of a possible epidemic or a drought hitting a high risk area, thereby speeding up the chances of containing and resolving such events. It’ll be much easier to assess the impact of technology, gauge the state of literacy in a country, and intervene in case of an emergency.

Every day the data are adding up. Some just add few changes to the existing data and provide the output. But there are many who really dig the data study the reports and only after scrutinizing the final are presented. Only such data will be more thrilling to understand the underlying logic and concepts. So always dig for such valuable data. Learn this here now.

Of course, the key challenge is of a technical nature: obtaining better algorithms, more practical tools to better display and interact with data. The vast majority of Big Data is devoid of any structure; it involves the simple accumulation of differences, and requires very sophisticated means to make any sense of it.

It also remains to be seen whether the ever more intensified search for private data – the direction that Google and Facebook have been heading in for some time now, obviously for business reasons – is worth all the good intentions to improve international development. In fact, the World Economic Forum report is quick to emphasize the need to preserve anonymity and privacy when it comes to collecting this type of information.

But there’s another point I’d like to bring to your attention, which is of a more epistemological nature. And that is: one needs to be careful not to overestimate the rhetoric of “big data”.

In 2008, the founder of “Wired” Chris Anderson published an article that circulated the intellectual world: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. In short, Anderson argued that the unrelenting increase of information would supplant the classic approach of hypotheses backed up by rigorous testing: “Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with an unprecedented accuracy. With enough data, the numbers speak for themselves.”

Four years later, the advent of Big Data seems to suggest exactly the same idea and promise: one day we will be able to predict the future, because we’ll have everything we need to do it. Was Anderson therefore right? It truly would be the death of the scientific method, and a sort of posthumous victory for the radical inductivists: no more conjectures and rebuttals, instead we’d get straightforward answers drawn from the accumulation of enormous evidence.

The philosopher Karl Popper put it this way: the proposition “All crows are black” can’t be verified by experience. No matter how many black crows can be observed, there will always be a white one that wrecks my hypothesis: hence the fallible nature of science.

But if I had all the existing data about every single crow?, rebuts the theorist of Big Data. A fascinating idea: an increase in the amount of data causes a radical qualitative change – the scientific discovery changes accordingly.

We will no longer need theorists, just analysts instead. The ultimate explanation of the universe no longer lies in the hands of a few scientists, but in the labyrinth of enormous databases: it’s just a matter of finding it.

As long as we just limit ourselves to viewing Big Data as an integrated resource to our heuristic ability, we have something to be optimistic about. The overwhelming amount of data that has been made available can be a huge step forward in terms of speed and accuracy for any type of knowledge business. And in terms of its application, in practical matters – from medicine to R&D, through to the fight against crime – are truly exceptional.

But the challenge for those who see Big Data as a metaphysician landingplace goes a step further: the creativity and inventiveness of human beings, one day, may be unnecessary. Computation will triumph over intuition. But will it also be a sustainable and desirable triumph? And are we dealing with a realistic scenario?

You may doubt it: whether that be business or science, basing one’s own initiatives on the mere accumulation of information – as much as they are filtered, analyzed, ordered according to algorithms and models – seems still insufficient.

Either we give some sense to the idea of Artificial Intelligence (and we accept, like Ray Kurzweil, that one fine day we’ll all awaken to a Technological Singularity, a post-human era) or there isn’t much else to do: whatever the set of data, it’ll always lack that touch of human understanding. And as Alan Turing demonstrated in 1936, there will always be a non decidable problem – a problem where an algorithm can’t find an answer in a finite time.

Chris Anderson ended his article with a question: “What can science learn from Google?” Kevin Kelly, another American technology guru, said that computers give good answers, but are unable to ask the right questions. A simple and incisive criticism.

So let’s flip Anderson’s question into a preventative one: “What can Google learn from science?” The answer: probably a bit of humility.


Comments are closed.