Klaus' Log

Mi 07 Dezember 2016

Cheap Personal Backup using AWS S3

Posted by Klaus Eisentraut in scripts   

The last years, I only used an external hard drive which is kept offline as my only backup. This approach has two big downsides:

  • I'm lazy with backups and don't do them regularly.
  • No redundancy in case of fire or accidentially deletion.

So I decided to use an easier solution instead. I came to the decision to use Amazon Web Services. Furthermore, I took some simplifications to keep the whole process as simple as possible:

  • I only want to backup my private photo collections which are around 60 GB. It would not be too bad if I lose my MP3 or movie collection.
  • Everything should be stored on the HDD in my laptop, on the external HDD and in the Cloud, i.e. I will be following the 3-2-1 rule.
  • To make things easier, I only want to backup immutable folders (A photo collection of an event is immutable!). If a single file inside a folder changes, I'm willing to upload the whole folder again.
  • Everything should be encrypted by strong cryptography.

After reviewing the cheaper, but more complicated Amazon Glacier, I decided to rather use Amazon Simple Storage Service instead. The total costs will still be below 1 USD/month, so the negligable savings when using Glacier are not worth the extra efforts (e.g. two stage retrieval) using it.

Installing the AWS Command Line Interface (aws-cli) was very easy as it was in the Arch Linux repository and done by a simple pacman -Syu aws-cli. After creating a AWS Account, I created a S3 bucket and an AWS IAM user which has full access to this bucket (following this tutorial). The credentials for the IAM user had to be configured (this are not the real ones!):

$ cat ~/.aws/credentials 
[default]
aws_access_key_id = A89ABXXZIVASSDDQ
aws_secret_access_key = +jtJLidi3ld9vlsL9sl9dls9zoif/

Then, I set the cheapest region us-east-1 as the default region and enabled the dualstack support:

$ cat ~/.aws/config 
[default]
region = us-east-1
output = json
s3 =
    use_dualstack_endpoint = true

Now I was able to upload and download files from S3 with a similiar syntax as scp has. An example command is aws s3 cp s3://my-aws-bucket-name/remote-file.tar.gpg local-file.tar.gpg

For the actual backup, I decided to do it the following way:

  • Whenever I get new pictures from my camera or from somebody else, I'll first copy them into a local folder somewhere.
  • Then I run a simple script which takes the folder which should be backuped and the actual name as arguments.
  • The script prompts me for an encryption passphrase which is identical for all archives. Symmetric encryption has the great advantage that I can't lose my private key file. It is secure enough, too, if only the PGP passphrase is strong enough.
  • I need to make sure that the passphrase is not accidentially misspelled, because this would render the encrypted archive as unusable.
  • The archive is created, encrypted and stored on the local harddisk in my laptop and also uploaded to the AWS Cloud immediately.
  • Whenever I'm not lazy, I'll sync the backup directory of the local harddisk to the external hardddisk.

So whenever someone gives me a collection of photos, I'll run the following script. You can use it, too, all settings which must be changed are marked with "TODO".

#!/bin/bash

# TODO: set correct bucket name here
AWSBUCKET=my-aws-backup-name

# check correct usage!
if [ ! "$#" -eq "2" ]; then
    echo "Usage: $0 /path/to/folder/which/is/backuped name_aws_backup"
    exit 1
else
    # check if folder exists
    if [ ! -d "$1" ]; then
        echo "Folder '$1' does not exist!" 
        exit 1
    fi

    # check if name has format YYYY-MM-DD_alphanumeric_description
    if [[ ! "$2" =~ 20[0-9]{2}-[01][0-9]-[0-3][0-9]_[a-zA-Z_\-]+ ]]; then
        echo "Name '$2' is invalid!"
        exit 1
    fi

    # read and check passphrase
    echo -n 'enter passphrase: '
    read MYPP
    echo "SHA512 of password is $(echo -n "$MYPP" | sha512sum)"
    # TODO: adjust the a044f02a hash with the beginnings of your hashed password.
    #       this prevents you from backup-ing unusable archives (which have a wrong password)
    if [[ ! $(echo -n "$MYPP" | sha512sum) =~ ^a044f02a.*$ ]]; then
        echo "wrong password"
        exit 1
    fi

    # ensure that folder exists and remove previous files
    mkdir -p "/tmp/awsbackup/"

    # encrypt everything, note that GnuPG does compression already!
    echo "Copying files..."
    ln -s "$(pwd)"/"$1" /tmp/awsbackup/"$2"

    echo "Creating and uploading encrypted archive \"$2.tar.gpg\" ..."
    cd /tmp/awsbackup/ >/dev/null
    tar cvh "$2" | 
      gpg --s2k-mode=3 --s2k-cipher-algo=AES256 --s2k-digest-algo=SHA512 --s2k-count=65011712 --symmetric --cipher-algo=AES256 --digest-algo=SHA512 --compress-algo=zlib --batch --passphrase=$MYPP | 
      tee /awsbackup/"$2".tar.gpg |   # TODO: adjust local backup path!
      aws s3 cp - s3://"$AWSBUCKET"/"$2".tar.gpg
    cd - >/dev/null

    # we don't need the passphrase anymore, so we overwrite it
    MYPP=0123456789012345678901234567890123456789

    echo "Upload done! Removing symlink ..."
    unlink /tmp/awsbackup/"$2"
fi

All it does is to do some validity checks, read in a passphrase, check the passphrase for typing errors by comparing it to the first few bytes of its SHA512 hash and afterwards copying the encrypted archive to the local disk as well as AWS S3.

The script in action looks like the following (backup-ing the local folder DCIM):

$ ~/bin/awsbackup.sh DCIM/ 2016-12-07_backup_demonstration
enter passphrase: not-my-actual-passphrase-enter-your-own-long-one-here
SHA512 of password is a044f02a4abc68f4378a86b3f4b32a198ba301845b0cd6e50106e874345700cc6663a86c1ea125dc5e92be17c98f9a0f85ca9d5f595db2012f7cc3571945c123  -
Copying files...
Creating and uploading encrypted archive "2016-12-07_backup_demonstration.tar.gpg" ...
2016-12-07_backup_demonstration/
2016-12-07_backup_demonstration/IMG004.jpg
2016-12-07_backup_demonstration/IMG003.jpg
2016-12-07_backup_demonstration/IMG002.jpg
2016-12-07_backup_demonstration/IMG001.jpg
Upload done! Removing symlink ...

From time to time, the local backup is manually copied to the external hard disk, too.

Finally, I wrote down the password for my AWS Account as well as the encryption password and stored it at two different locations. Furthermore, I told a relative about it and he was able to restore the backup using the AWS Console and my notes. This should be sufficient that I won't never loose any personal data.