Exporting your data from 2016
The beginning of the new year is as good a milestone as any to back up the data you have stored on servers you don’t own. I started doing this in 2013 and this year I kept a few notes on the services I use, as a bit of an update.
Why is this important? Mostly because cloud services disappear, some more abruptly than others - for example I lost quite a few playlists when Grooveshark was shut down with absolutely zero warning. Music services are incredibly unreliable as they are so encumbered by the record industry's armies of lawyers.
Exporting your data is a great way to rate the trustworthiness of a service. What do they collect? Do they let you inspect that data? Do they let you back up that data, or do they lock it up in their walled garden? In essence - do you own the data or not? If you can't take it when you leave, you don't own the data - so perhaps you should think about what you're putting in there.
|Service||Can you get 100% of your data back,
in a usable format,
without using a third party tool?
|The Old Reader||Yes|
|Service||Has export?||Requires auth?||Export quality||Export usability|
|Yes, but see notes||Yes||Low||High|
|Kindle||No (see notes)||No||High||Low|
|The Old Reader||Yes||Yes||High||High|
Has export gets a yes if there’s a vendor-supported export tool. Third party tools don’t count as they, at best, require the security antipattern of authorising unknown systems.
Requires auth: do you need to provide authentication to get the data?
- Low: core data missing
- Medium: core data complete, but extras missing
- High: complete, you get everything
- High: some form of UI like generated HTML files that you browse
- Medium: raw, structured data like CSV
- Low: unstructured data
Usability is obviously extremely subjective, but since anyone with Excel or LibreOffice can open a CSV by double-clicking it, I think it’s fair to call that at least medium level usability. I’m also assuming that people who care enough to download data can handle a .zip file.
- Facebook exporter
- Google Takeout
- LinkedIn data exporter
- Strava settings – Download All Your Activities
- The Old Reader settings – Export your feeds
- Twitter settings – see Request Your Archive
Third party tools (use at own risk):
Facebook still has an exporter of sorts. It’s easy to use as it’s an HTML page, but the data leaves out so much detail it’s not really worth the effort. For example posts lose all links and media, you just get the raw text (that hilarious lolcat post just shows up as you saying “lol”); line breaks are removed so long posts are just a wallof words; shared media doesn't include a URL; contacts is just a list of names without any contact details. So while the intent is there, the result isn’t what you really want.
Foursquare/Swarm have an unsupported XML feed feature but no export or backup option. There are third party tools that allow you to extract the data using their API. Perhaps I'm marking hard here, but I'm putting this into the "no" category for "can you get your data back" because I'm thinking about average users and not developers who are used to dealing with API keys and endpoints.
Google takeout does an excellent job of giving you data. You may get so much data it’s hard to handle – if you are a serious Gmail user, for example, your backup will be however many gigs and your Takeout comes in a spanned archive. It can also take days for the archive to be ready. But the important thing is that you can get your data, in reasonably appropriate formats (although it does vary, eg. Keep's output is pretty clunky as you get a separate HTML file for every note).
Instagram don’t offer any export options as far as I can tell; and the third party options don’t inspire confidence. Since I already treat Instagram as disposable I didn’t pursue it; but if you don’t have those images backed up you could give those third party services a try – just revoke access to the apps and change your password immediately afterwards.
iTunes isn’t a cloud option like the others, but since playlists are pretty important to a music fan it’s worth including. iTunes exports the entire libary in an XML file that’s essentially useless for humans, leaving you to laboriously export playlists one at a time (select the playlist → file → export playlist). The resulting tab-delimited .txt files are ugly and a bit error-prone, but can be massaged into something useful using a spreadsheet tool. Cumbersome, but it does work. Similarly you can back up and transfer your listening history, even between systems – again, cumbersome, but possible. So overall iTunes kinda sorta lets you move your data around, but it’s a huge pain to do it and you certainly can't use the data in another sytem.
Kindles can just mount as USB drives, so they’re easy to back up. One caveat here is I jailbroke mine years ago to change the screen saver images, so I don't remember if that's required to make it mount as USB. See Documents → My Clippings.txt for a plain text copy of any highlights you’ve made on the Kindle. To make things more manageable you may prefer to use Calibre to manage your Kindle, but as a fast backup it’s hard to beat select-all, copy and paste.
LinkedIn offers a quick backup in around 10 minutes or a full backup in around 24 hours. You receive a .zip file with .csv files (the full backup includes a few extra files). Simple and actually pretty easy to use.
Last.fm still hasn’t restored their exporter and after more than a year it seems unlikely they’re going to. There is a third-party tool that gives you a CSV file. Since all of that data is public already and it doesn’t require auth, I’m pretty happy using that third party tool.
Spotify still has no exporter. There are several third party options which all require you to supply your auth details in some manner. Whether you trust them is up to you. I’m yet to try the downloadable Python options, which seem like the most controlled option.
The Old Reader had an export feature from day one, as a key trust builder. Smart, given its origins in the Great Google Reader Execution of 2013. The export is an OPML file that should be supported as an import format in other readers.
Strava exports your data in some GPS data formats (eg. GPX) that I’m not familiar with. I’ll give it a medium rating as it seems appropriate for the data it’s handling. It doesn’t include photos you’ve posted or any summary data in a common format like CSV, despite a lot of requests in the forums. There’s no data geek quite like a fitness geek.
Twitter’s export of tweets is pretty neat: you get everything in an HTML interface including searchable tweet history; plus you get a copy in CSV. Takes a few minutes for the archive to be prepared. The weirdest down side is the tweets aren’t in chronological order… in fact I’m not sure what order it is in. Also you only get your own tweets, not direct messages, likes or other peoples replies. You can link out to the live site for conversations, assuming the account is still live.
- I don’t bother backing up Flickr or Instagram. Flickr, because I keep all my originals in cloud storage anyway; and Instagram because I treat it as disposable content.
- Similarly I don’t particularly need to back up iTunes and Spotify play data as I aggregate it into Last.fm.
- I don’t use Wordpress so haven’t looked into exporting its data. Blogger is included in Google Takeout.
No major suprises or changes here – the good players remain good, the bad remain bad. Music services are still terrible across the board.
Facebook is perhaps the most devious, giving an archive that appears slick but contains crippled data. If you were trying to genuinely preserve your Facebook stream, you’d be sorely disappointed: your data is firmly witheld in the walled garden.
Twitter do the best job of making the export usable for the average person. Weirdly the offline backup has a better, more deeply-searchable history than the live website (particularly for private feeds). But even the export still leaves out data most people probably consider core, like direct messages.
I’ve said for a long time that user data should be portable. By locking up user data, all services are weakened. While it suits companies that currently benefit from being a walled garden, it is a pyrrhic victory in the long term. If this bothers you, I encourage you to look into the indieweb.