The Problems of Long-Term Digital Archiving

I imagine that backing up and protecting data is a pretty standard concern for most people these days (note: I don’t however mean that we’re all actually doing something about it however). But for anyone who has collections of family memories on either backup hard drives or such consumer-friendly cloud services like iCloud, those risks have obviously been identified already.

Yet the reality is that it’s still unlikely that the precautions that we’re taking today to preserve this data is going to be sufficient in the longer term. Whilst the first paper that we’ve discovered (from 2nd century China) has survived to this day, what are the chances of a stash of your favourite JPG’s surviving for hundreds of years? If so where and how will they be indexed?

And, even if we do manage to preserve such a collected human history, as Vint Cerf has just pointed out, there’s a very real chance that we might end up storing a vast amount of data with absolutely no idea what that data actually is. Or to put it another way, we might have created a file using Photoshop but that fact – together with the details of the software used itself – is then lost over the passage of time, rendering the data useless in the future.

There’s an interesting proposal to carry out a type of X-ray analysis – whereby a snapshot could be taken of the digital environment in which the file was created (i.e. the software, the computer model, the operating system etc) in a way that could then be easily checked far off into the future. However, the sort of business that carries out such an essential service would be one that would have to survive for hundreds of years. That’s not a sort of business that we’ve ever seen to date.

I can’t help but think that there’s a blockchain solution for this in some way.