The Problems of Long-Term Digital Archiving

I imagine that backing up and protecting data is a pretty standard concern for most people these days (note: I don’t however mean that we’re all actually doing something about it however). But for anyone who has collections of family memories on either backup hard drives or such consumer-friendly cloud services like iCloud, those risks have obviously been identified already.

Yet the reality is that it’s still unlikely that the precautions that we’re taking today to preserve this data is going to be sufficient in the longer term. Whilst the first paper that we’ve discovered (from 2nd century China) has survived to this day, what are the chances of a stash of your favourite JPG’s surviving for hundreds of years? If so where and how will they be indexed?

And, even if we do manage to preserve such a collected human history, as Vint Cerf has just pointed out, there’s a very real chance that we might end up storing a vast amount of data with absolutely no idea what that data actually is. Or to put it another way, we might have created a file using Photoshop but that fact – together with the details of the software used itself – is then lost over the passage of time, rendering the data useless in the future.

There’s an interesting proposal to carry out a type of X-ray analysis – whereby a snapshot could be taken of the digital environment in which the file was created (i.e. the software, the computer model, the operating system etc) in a way that could then be easily checked far off into the future. However, the sort of business that carries out such an essential service would be one that would have to survive for hundreds of years. That’s not a sort of business that we’ve ever seen to date.

I can’t help but think that there’s a blockchain solution for this in some way.


Out of Context in Evernote

I’m a huge fan of Evernote. After Google, I probably rely on the company more than any other throughout the course of my daily digital life. And like increasing numbers around the world, I trust them heavily to back up my brain. But I’m just not a huge fan of their Context product.

If you don’t use Evernote, let me explain. I usually write directly into Evernote – meeting notes, blog posts, whatever – in addition to saving articles that I need to return to at a later date. Using the service is a no-brainer when you can rely on it to sync perfectly across platforms (well, usually; it has had the usual challenges that face any tech company growing at rapid speed). However the company recently decided to ‘reward’ premium users by surfacing a number of (hopefully) relevant articles in the space below your writing space.

Now, I’m not entirely sure but I *think* that previously they’d surface other notes that you’d made and saved yourself. Presumably they used some form of fairly simple word recognition algorithm that just scanned your collected content. And of course that could be useful – on occasion. It was rare that I actually click to open the suggestions but occasionally it was interesting to realise that I’d written/saved an article months earlier on a similar topic. No big deal. Little benefit for me in practice but, with minimal interruption, no complaints from me.

However, at the start of October, Evernote introduced Context. The company now surfaces relevant ‘high quality’ content from selected media partners, such as LinkedIn and The Wall Street Journal, within that space.

There’s a couple of immediate problems here. Suddenly noticing external content pop up within your own personal safety deposit box immediately creates dissonance. Whether it’s relevant is neither here nor there. And in the same way as there was uproar when Google finally admitted that it was scanning all your emails to serve you advertising within Gmail, there are inevitable questions about privacy. How much of your data are they sharing with third parties to provide this content?

For customers with sensitive personal information (despite the warnings, people still use cloud services like Evernote and Dropbox to store this sort of data), the real concern is that this information is being shared without permission. Evernote have now clarified that this is not the case but there’s no doubt that the public are becoming both more wary and more vocal about such issues.

Evernote position is that they are providing additional content because it is valuable context that will help you work more efficiently. But to a premium customer who is already supporting the service by paying a subscription, these suggestions at first glance look, to all intents and purposes, to be advertising. And don’t we usually pay to remove adverts?

It’s not all bad though. If you can get over the privacy concerns and get comfortable with the use of data, they’re bringing something that could be hugely valuable to workforces using the Evernote Business service. If you start working on something that a co-worker has already tackled and the product surfaces the relevant notes, you can see how many wheels will avoid being reinvented. Get it right – and Evernote have a real chance here given the quality of the search technology that they’ve built within their platform – and they could be onto a big winner.

But first, they need to allay those concerns. I don’t go to Evernote to find new knowledge. I go to Evernote to find the things that I’ve already filtered out as being valuable for me to store. Third party curation is something that necessarily should be happening outside my personal ecosystem.

I’ll not be going anywhere. Evernote remains a truly valuable resource and immensely powerful if used correctly. But whether it’s down to an issue of design, communication or a young algorithm, Context still has a long way to go.