Automated Removal of Avatar and Header Images of Inactive Remote Accounts

Mastodon version 4.1.0 improved the admin CLI to allow removing avatar and header images cache of inactive remote accounts from media storage. In some cases running this command can reduce media storage usage significantly, and I decided that this was an excellent command to automate.

As there were many terabytes of small media files to delete one by one, the first run of this command took several weeks. The initial run was announced in Masto.host Mastodon account, and the automation will be included in the daily media cache cleanup next week.

While most of the initial run was done using the default 7 days threshold, after getting user feedback and my personal experience, I decided to increase the threshold value to 30 days. Meaning that the locally cached avatar and header images will be deleted for any remote account that has had no activity for over 30 days. Mastodon will fetch them later if/when the user becomes active again.

This command was built from Mastodon’s community necessity to reduce media storage usage. However, it introduces some issues, “imperfect” is the word used in the pull request. The most obvious problem is that remote avatars and headers are not re-fetched immediately when user profile is accessed or user activity is detected. This is the main reason why I increased the threshold to 30 days.

The initial attempt to bring this functionality to Mastodon was dropped because the author concluded that “the webfinger is not updated as often as I suspected. I’ll have to find another way of doing this. updated_at also doesn’t work… I think I know why this has never been implemented. It’s really non-trivial and full of edge cases“.

The second attempt and the one currently being used improved from the initial one, which only used webfinger for determining if a remote user was inactive by combining both webfinger and the updated_at values. But this improvement is still “imperfect“, and hopefully, more work is done to either find a more reliable source to determine user inactivity or to more rapidly re-fetch avatars and headers.