Remote Media Cache Backups

Mastodon locally caches a copy of remote media files from the content that federates with your server. This media cache avoids hotlinking and missing/slow content when the remote server is down or slow.

On Masto.host currently, this cache is trimmed with the following limits:

When the media cache is trimmed/deleted, if a user accesses the content where that cache is missing, Mastodon will automatically attempt to redownload it.

Even when trimming cache, the cached files are still the majority of the media content stored for most Mastodon servers. The question arises, should one back up those media files or not?

When Mastodon first started, there was no simple separation of cached content or content from the users in a Mastodon server. If one wanted to do a backup of only local content, it had to filter it by doing database queries, which was not practical. So, on Masto.host I always backed up everything.

In 2020, Mastodon changed its cache and created a directory where the remote cached media files are stored separately from the local media files. This change made removing cached media content from backups easier.

Still, up until now, I have been backing up the remote media cache but decided that going forward, I will not continue to do so. Mastodon instances keep growing every month. Soon, it would be virtually impossible to do daily backups of both local and remote cached media files. At least with a cost-effective storage system, it takes a long time to check every file one by one.

A disadvantage of not backing up the remote media cache is that it will require some extra manual work to restore a Mastodon server from backup. Also, it can take longer for the Mastodon instance to be back to normal again.

In theory, Mastodon will be able to redownload most of the cached media files. Still, some, such as account avatars/headers, will not be recoverable if the instance is no longer online.

I plan to offer extra options to make this configurable in the future. Both for increasing/removing the limits of the trimmed cached content and ways for you to determine if you want to back up the remote cache or not.

If having those as paid options are something you want, please let me know, and if I get enough requests, I will move that up on my priority list. If you have any thoughts or questions, feel free to reach out.