Access to backups

Hugo GameiroAnnouncements

I have been “Flintstoning” Masto.host for over 4 years. Over time, I have automated much of the installation, deletion, backup and maintenance of the Masto.host platform but every single interaction on the current platform was carried out by me.

This never represented a major issue for me and, in fact, it came with some benefits but there was one thing that I could not offer and that I feel as being essencial moving forward, which is access to backups.

I struggled with this and thought about different solutions until I finally decided to “bite the bullet” and build a web interface that grants people access to their backups and other functionalities.

Masto.host started as a moonlighting project and I never expected to make a business out of it but it’s now my part-time job. I would gladly do it full-time but the numbers are not there.

Thus, last year I decided to put my mind up to it and on top of running Masto.host as a part-time, I had to come up with a web interface on my spare time. It has taken me over a full year just to get the basic functionality ready, though now it is starting to come together and I forecast that by January 2022 I will start providing access to beta testers.

I wanted to make the announcement prior to that because the data associated with all subscriptions for Masto.host will soon migrate from my local storage to being online.

I don’t store a lot of data of the people that sign up for Masto.host, only email, domain for installation, plan used, PayPal subscription ID and the payments history. It makes no sense for me to store other people’s personal data unless it’s strictly necessary.

Besides the referred data, the database on the new platform will temporarily store the IP used to sign in to the web interface and other encrypted login data, such as password, “remember me” series and tokens, time-based one-time password/two factor authentication (TOTP/2FA) secret and backup code for TOTP/2FA.

Hence, if a catastrophic event happens and any data is leaked from the database, you know the extent of data that it includes. If you want to change the email associated with your account, please contact me via email.

In the future, I might need to also store the country of origin and the IP used to sign up for the service due to European VAT tax laws but that will be announced if or when it is necessary.

Before the closed beta testing is available, I will update the Terms of Service to reflect these changes.

During the closed beta testing the only functionality that will be available is access to request an archive of the backup and a way to download the archive. Obviously all login functionality (TOTP/2FA, reset password, TOTP/2FA backup code) will be in testing mode, too.

If you want to join the closed beta testing, please, email me at info@masto.host.

If you want to know more about the technical details please keep reading, if not, that’s all, folks.

Technical details

First, the question you might be asking is why I decided to build this from scratch and not use a tried and tested platform. The answer is that I don’t want to work around or on top of something generic that didn’t have my usage case in mind.

I do use WHMCS to run my other online business and it feels to me like a jack-of-all-trades, master of none. I’ve tried other options and they all pretty much have the same problem. Plus, this way I know exactly what each line of code does and can easily debug or build upon.

The platform was written in PHP because it’s obviously the superior programming language 🙂 I’m joking, it simply is the language I am more experienced with and with which I have built multiple login systems in the past.

I use only two PHP libraries RobThree/TwoFactorAuth and phpseclib. On the front-end I use jQuery but the base functionality should work with Javascript disabled.

Encryption

The password (requires at least 10 characters – one uppercase, one lowercase and one number) and is stored encrypted using hash_hmac, base64_encode and password_hash and is checked against password_verify.

The TOTP/2FA secret and backup code and the “remember me” token are encrypted using openssl_encrypt, some concat magic and base64_encode.

Authentication

All accesses require https and, for the time being, I opted not to use captchas and simply log/block brute force attempts, both by IP and by user.

Authentication uses a combination of a cookie unique ID (bin2hex, random_bytes) and sessions, valid for 24 minutes. Meaning 24 minutes of inactivity will require a new login.

Remember me

The “remember me” is valid for 5 days and uses a cookie with a combination of series and tokens. For each login with “remember me” enabled, two unique IDs are generated, one for series and another for token.

Series is stored unencrypted on the database and token is stored encrypted.

When a “remember me” cookie is presented, it’s checked for validity and against a valid series. If it passes the validation, both the cookie presented values and database values are stored in memory. After that all cookies and current session values are deleted and all references for that series on the database are invalidated. This means that the series can only be brute forced once.

Next, the token is unencrypted from the database value and validated against the presented token using hash_equals. If everything is correct, a new “remember me” cookie (new series and new token) plus authentication cookie and sessions are created.

Access to the infrastructure servers

All servers from the infrastructure have root login disabled and listen on non standard SSH ports. Also, they can only be accessed using SSH keys and after login can only be escalated to root privileges with a password.

The web interface will be hosted on an independent server external to the infrastructure and will have its own SSH key, tied to a single IP (not the public one), and can only run a single script.

Such script is readable and executable but not writable by that user/key combination and calls another script that is on the /root directory and which only the root user can read, write or execute. This root only script validates the request and, if the combination of parameters from that request is valid, it can trigger one from a limited set of actions.

My plan is for all actions available through the web interface to be reversible and for none to be data destructible. E.g., once the functionality for cancellation is introduced, when a user requests the cancellation and deletion of their service, the service goes offline and the data is deleted on the actual production server but kept in the remote automated backups until its expiration date.

The underlying idea is securing means to revert the process, in case of the worst scenario of someone gaining access to the user account.

Access to backup data

The hardest part with Mastodon backups is the amount of small media files. In some cases, we are talking of millions of files. Currently, I have an automated system running 24/7 just to keep the backups up to date and, in some cases, it can take close to 24 hours from one backup to the next.

In order to provide a simple way to download a backup, data needs to be archived. I opted for using tar because it combines speed with availability. Still, I couldn’t find an affordable way to tar every backup I automatically generate. It would require more system resources and that would force me to increase Masto.host plan pricing.

Therefore, I opted to build a system where the user can request to generate an archive of their backup through the web interface. Once the archive is ready, an email is sent to the user with the link to start the download process. The aforementioned link includes multiple parameters and one is a token that is NOT stored in the database.

When the user clicks on the link sent by email, the user is directed to the web interface and needs to log in or be logged in to be able to see the download page. When the user reaches this page, it triggers a request to the backup server where the token sent by email is validated and, if valid, it triggers the creation of a symbolic link on the backup server that will partially validate a unique URL to download the backup archive.

Such unique URL is only valid for 60 seconds. Using NGINX secure_link it combines a shared secret between the server and the web interface and ties the URL to the IP of the user making the request.

If the user doesn’t click the link to initiate the download for 60 seconds, nothing happens and the user just needs to refresh the page to generate a new one. A cron on the server cleans unused symbolic links, just to keep things tidy.

If the user clicks on the link and starts the download, as soon as the user finishes or interrupts the download, the symbolic link is removed. If the download takes longer than 60 seconds until it finishes or is interrupted, this would not be necessary but, again, it’s nice to keep stuff tidy.

If the download has been completed, it cannot be requested again. All requests to download and actual downloads are stored temporarily (IP and details of the request).

Suggestions/ Questions

Please, do share any suggestions or questions you might have on Mastodon or email.