Too much data, not enough information

Every second of every day millions upon millions of bits of data are being created. Just been to the supermarket and used your loyalty card to purchase your horse meat burgers and soft-cheese snack packs? There’s some data right there. Searched for something on the Internet and clicked on sponsored link? Data. Used your work pass to get through a car park barrier? More data. Pretty much everything we do nowadays that has some interaction with some system somewhere is storing some kind of data. There was a lot of usage of the ephemeral ‘some’ in that last sentence, I realise, but this is just because the possibilities for what can be stored are so massive.

So, there’s an awful lot of data. A shedload, in fact, as long as you have a very big shed – according to IBM (who I’m assuming to be quite clever people and therefore knowledgeable about this kind of thing) we as a species produce some 2.5 quintillion bytes of data. That’s 25,000,000,000,000,000,000 bytes, as long as I’ve not made a typo and got the number of zeroes wrong. That seems quite a lot, and it’s quite worrying to imagine just how much of that comprises photos of cats.

Whilst there’s a massive amount of data being generated, however, I’m not convinced on how much information is being produced. The distinction is a subtle yet important one: data is the raw material, and information is the end result of some processing – be this automated or involving manual intervention – and forming something that conveys meaning to somebody. In and of itself, data is pretty useless, yet I can’t help thinking that those of us who work in IT are often guilty of providing this rather than actual information to users, asking them to provide their own interpretations. It’s a bit like a glazier providing you with a big of sand and an oxy-acetylene kit.

The tricky thing, I guess, is that it can be very hard to understand what people actually want or need to know. I used to try and elicit system requirements from users (in part, at least) by trying to get from them an idea of what reports they would like to get out at the end. This seemed to make sense: surely people would have an inkling of the sort of information they need to see. What rather rapidly became apparent, somewhat to my surprise, was that quite often people didn’t know this; rather, they wanted us as a development team to guide them in deciding what they wanted. This is often remarkably difficult to do, and I think in part this is because as a software engineer your brain tends to work in terms of process and logic, and this isn’t what’s needed here.

The term ‘Business Intelligence’ is often misused, as well as being somewhat esoteric. In it’s true form it refers to the vast array of methodologies and processes that occur in alchemic act of transforming data into information. Many people wrongly assume that this basically boils down to the generation of a few reports and maybe the odd managerial dashboard (you know, the whizzy things with the graphs and those lovely 3D pie-charts that,okay, don’t really tell you an awful lot but, boy, do they ever look great!). Of course, a lot of it is that kind of thing, but if you think of BI as being solely that then you’re doing it a great disservice. BI runs the whole gamult of things from data warehousing through strategic analysis frameworks such as balanced scorecards all the way out the other side into the murky depths of trend analysis. Making BI work is, I think, one of the key challenges in enterprise IT today, incorporating not only the technical obstacles involved in ensuring all pertinent systems and data is integrated in ways that allow for dynamic cross-measurement, but also the difficulties that arise from trying to determine the ‘what’s, ‘when’s and ‘how’s.

One of the big buzz-phrases at the moment is ‘big data’. I’m not going to delve into the details of this, because it’s something I don’t really have what I feel is an adequate understanding of it and its impacts at the moment, but suffice to say, the possibilities that the increased concentration of research into it will have on the world on BI are potentially enormous. Most organisations, though, I suspect are still struggling with whatever small or mid-size data, and just adding more into the virtual pot really isn’t going to help the matter.

Leave a Reply