Masto.host server resources are shared between Mastodon servers. This offers some optimisation in terms of resource usage for concurrent active users, as people access at different times and peak active users per Mastodon server is very different depending on the location of most of its users.
Still, the sharing of resources also causes a problem in terms of federation. When the federation load increases for one Mastodon server, it usually also increases for others. This is usually fine as I allocate enough resources to account for that but there are some edge cases that can be problematic. The one that I have notice the most is processing of videos and animated GIFs.
If someone popular shares a video or an animated GIF, this will need to be federated with many servers and obviously there is a high probability of several servers I host simultaneously having to process that video/GIF. This simultaneous processing is even more frequent when a popular Federation Relay shares a video/GIF and multiple servers I host participate on that Relay.
In most cases, this is not a problem as most videos/GIFs are or have already been processed to Mastodon compatible specifications (dimensions, frame rate, etc). All those shared via the current main Mastodon fork already meet specs but other Activity Pub software have different media specifications and this can cause the federation with Mastodon to either require the processing of the video/GIF or media processing to fail. Hence sometimes you may find posts with media tagged ‘not available’ – this can also happen when the remote server is overloaded and will be tried again later.
When videos and animated GIFs need processing (not just a copy) this is very resource intensive and if multiple Mastodon servers are doing it inside the same machine it can cause the load to spike above what would be ideal. See the image bellow where a situation like the one described happened on the app servers of my platform. Notice they all match the exact timing of the load spike as a very large animated GIF was being processed multiple times simultaneously.
To attempt to mitigate this, I will be gradually rolling out a test where videos/GIFs will only be processed once. When a Mastodon server hosted with me, receives a remote video/GIF it will check if it hasn’t been processed before by another of the servers I host. If it hasn’t, it will mark the URL as being ‘in processing’ and after it does, it will provide the URL to the processed URL, so other servers can use the processed file and just do a fast copy.
This will not change the Mastodon database, file storage, etc. The source URLs are still stored correctly and a copy of the file will still exist in storage for each Mastodon server. But this approach does have a problem, if a file is ‘in processing’ all servers that attempt to process it on that moment will avoid the processing (same as described above when remote server is overloaded) and attempt the processing later.
Besides hopefully mitigating this kind of load spikes, this will also free up Sidekiq jobs as they are not busy processing a video/GIF that can take over a minute in some cases and it will also reduce the load on remote servers as the file only gets copied once to my platform.
This will be a test and I will start with a small percentage of Mastodon servers and gradually rollout to others. If this causes any problems, I will roll it back and try to find a better solution. Any feedback or issues you may encounter please report them.
You can view the code that I will be testing on https://github.com/mastohost/mastodon/tree/v3.1.5-patch
Many thanks to Claire for developing this patch.