Big data warehouse a way of the future

17:00, Apr 12 2013

Rightly or wrongly, "privacy" seems to be the hottest topic in the public sector right now.

Word on Radio New Zealand (on Thursday morning) that the Government was considering putting "all the information it held on individuals into one big hub" evoked a somewhat Orwellian image.

The Government is not in fact considering merging agencies' operational systems into one big computer, which would be impossible.

Instead it is contemplating setting up a new data warehouse into which "anonymised" data could be dumped and then chewed over by researchers and policy bods, who would be looking for insights into how to better provide public services.

That anonymised data would be information on transactions between government agencies and individuals that had identifying information stripped out.

It wouldn't make sense to chuck every single bit of government data into the same pot. Too much of it is too trivial.


Rather, deputy state services commissioner Ryan Orange says the focus would probably be on collating interactions in the health, justice, education and social welfare spheres.

If you knew for example that 2 per cent of people who were made unemployed and then visited their GP seeking treatment for depression later beat up their kids, you might want to encourage Child, Youth and Family to try and keep an eye on who among their clients fell into the former overlap, assuming they didn't already.

Orange says that's a reasonable example of how the hub could be used.

"If we did that piece of work, we wouldn't have any idea who those individuals were, but we could make that connection and then point it out to the relevant departments as something they needed to be looking at."

This is the world of "big data". Ever cheaper computer processing power and ever more sophisticated analytics software is making it ever more viable to crunch larger amounts of data to gain more ever more marginal insights into such life-event correlations, most of which are usually going to be pretty common-sense.

"There are some things we know through decades of interactions," Orange says.

"The difference for us [with the hub] is we could track something across a cohort of people and put dollar values against all the services that have been utilised.

"We can then start calculating real 'returns on investment' for New Zealand by getting a specific knowledge of what is working and what is not, and the difference it is making."

Privacy can be an issue even with anonymised data. New Zealand is a pretty small country and if you could link up enough interactions that an individual had across government you might be able to work out who they were without being told.

But that risk seems mainly theoretical. A bigger hurdle for a project such as this is perhaps the business case.

How likely really is it that crunching vast government data-sets is going to reveal new insights that the public sector is then going to be able to make practical use of?

Is it the biggest priority? The nature of the beast is that you can never know for sure until the data-crunching begins, otherwise you wouldn't be doing the crunching.

But this would be a decent dollar-value contract for some information technology provider, so the project's sponsors, the Treasury and the State Services Commission, should make a plausible argument regarding what they expect to achieve.

Orange accepts this and says it is "early days" for the project.

"We are going to have to go back through our chief executives and the Cabinet to get all of this up and running and that is one of the things we are going to have to convince them on."  

One detail for techos: Orange says it is possible the government data could remain on the various departmental systems while it was analysed, instead of being first being pooled into a central hub, but he acknowledges that's just a technicality.