Introduction

AWS S3 Glacier Deep Archive can be a cost-effective method to back up your files in a secure manner to the cloud. I’ve written about the costs in the past, and now I’d like to show you how to get a S3 bucket and credentials set up to back up to Glacier Deep Archive easily.

To create the AWS resources, we’ll use Terraform, which is a multi-cloud Infrastructure-as-Code product. The major benefit of using Terraform is that the configuration is automatically managed by code and not reliant on a user manually clicking in the console to create resources, which can be error prone. Terraform makes managing and scaling infrastructure easy.

Prerequisites

  1. AWS CLI
  2. Credentials set up for AWS CLI
  3. Terraform

Creating the Infrastructure

I have made the Terraform code as simple as possible to consume by making two small modules. Modules are simply collections of Terraform code files that can be parameterized to easily create multiple copies of the same infrastructure. In this case, creating multiple S3 buckets and multiple Identity and Access Management (IAM) users can be achieved just by creating a few instances of the modules.

First, create a new folder, and preferably initialize it as a git repository. Then, place the following content in main.tf (or *.tf):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
module "my_arq_bucket" {
  source          = "github.com/Clete2/terraform_arq_s3_bucket.git"
  name            = "my-arq-backup" # S3 bucket names are globally unique. Name this something unique to you.
}

module "my_arq_user" {
  source          = "github.com/Clete2/terraform_arq_s3_user.git"
  name            = "my_computer_user" # Name whatever you like
  bucket_arn      = module.my_arq_bucket.bucket_arn
}

output "my_arq_user_access_key" {
  value     = module.my_arq_user.access_key
  sensitive = true
}

output "my_arq_user_secret_key" {
  value     = module.my_arq_user.secret_key
  sensitive = true
}

Don’t forget to change the placeholder name variables above.

Next, simply run terraform init in a terminal in that directory to initialize Terraform. It’ll download the Terraform AWS provider and initialize the backend. Terraform stores a state file locally on your hard drive by default when it creates infrastructure. I highly recommend creating a S3 bucket to store your state file remotely, or use another type of backend to store the file. If nothing else, be sure to at least check in the terraform.tfstate file to source control.

Once Terraform is initialized, you can run terraform apply to create the resources you’ve defined in your Terraform code. From here on out, when you make changes to the Terraform code, you can just run terraform apply to make the changes effective.

Finally, run terraform output my_arq_user_access_key and terraform output my_arq_user_secret_key to print out your IAM credentials to the console. You’ll need those to configure Arq.

Congratulations! You now have a S3 bucket* and an IAM user with access to that bucket. Before we move on to Arq configuration, let’s do a deeper dive into the infrastructure created by the modules.

* I have multiple offsite backup locations, so I’ve created only one S3 bucket and have not replicated it to another region. If your risk profile is different, you should look into enabling cross-region replication with Terraform.

S3 Bucket

The S3 bucket Terraform module creates the S3 bucket and configures it specifically for Arq backups. Arq itself manages what storage class the backup files are put into, so you don’t need to make any special rules in your bucket for moving the backup files to Glacier Deep Archive.

Below is a paste of the main code in the S3 bucket module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
resource "aws_s3_bucket" "bucket" {
  bucket        = var.name
  force_destroy = false
  tags          = var.additional_tags
}

resource "aws_s3_bucket_versioning" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

# Arq takes care of setting the objects to standard or deep archive depending on what is needed
# we only need to configure archival of noncurrent-versioned objects
resource "aws_s3_bucket_lifecycle_configuration" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  rule {
    status = "Enabled"
    id     = "noncurrent_transition"

    abort_incomplete_multipart_upload {
      days_after_initiation = 7
    }

    noncurrent_version_transition {
      noncurrent_days = 7
      storage_class   = "DEEP_ARCHIVE"
    }

    noncurrent_version_expiration {
      noncurrent_days = 187 # Minimum storage for Glacier Deep Archive is 180 + 7 for transition to GDA
    }
  }
}

resource "aws_s3_bucket_acl" "bucket" {
  bucket = aws_s3_bucket.bucket.id
  acl    = "private"
}

# Enforce the bucket being private
resource "aws_s3_bucket_public_access_block" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Let’s go through it resource-by-resource:

aws_s3_bucket

Simply creates an empty S3 bucket.

aws_s3_bucket_versioning

Enables versioning on the bucket. This setting is vital, because it prevents unintentional deletion of resources by keeping noncurrent versions of objects after they are deleeted.

Furthermore, with Glacier Deep Archive, you pay for 180 days of storage even if you store an object for a few seconds. Since you pay for that, you may as well keep that object around at least 180 days.

aws_s3_bucket_server_side_encryption_configuration

Turns on server side encryption for the entire bucket by default. Having an extra layer of encryption at the S3 bucket level can’t hurt, but Amazon does own the encryption key. I also ensure to pick a unique, long, random password to encrypt my Arq archives on the client side before uploading, so that Amazon or any other cloud provider cannot access my files.

aws_s3_bucket_lifecycle_configuration

Enables lifecycle rules that automatically transition noncurrent object versions into Glacier Deep Archive after 7 days and expire them after 187 days. This allows you to recover accidentally deleted files.

aws_s3_bucket_acl

Marks the bucket as private.

aws_s3_bucket_public_access_block

Further restricts access to the S3 bucket, preventing accidental data leakage.

IAM User

The IAM User Terraform module creates an IAM user and gives it full control over objects within the bucket, but only read permissions for a couple of bucket settings; following a least-privilege model.

Below is a paste of the main code in the S3 bucket module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
resource "aws_iam_user" "user" {
  name = var.name
  tags = var.additional_tags
}

data "aws_iam_policy_document" "s3_write" {
  statement {
    sid = "S3BucketActions"
    actions = [
      "s3:GetBucketLocation",
      "s3:GetBucketVersioning"
    ]
    resources = [
      var.bucket_arn
    ]
  }

  statement {
    sid     = "S3ObjectWrite"
    actions = ["s3:*"]
    resources = [
      "${var.bucket_arn}${var.bucket_prefix}"
    ]
  }
}

resource "aws_iam_user_policy" "s3_write" {
  name   = var.name
  user   = aws_iam_user.user.name
  policy = data.aws_iam_policy_document.s3_write.json
}

resource "aws_iam_access_key" "user" {
  user = aws_iam_user.user.name
}

Let’s go through it resource-by-resource:

aws_iam_user

Creates the IAM user with no permissions.

data aws_iam_policy_document

Defines the permissions that will be attached to the IAM policy. Note that limited permissions are granted for the bucket, but full control is given at the object prefix level.

aws_iam_user_policy

Creates the policy defined in the data.aws_iam_policy_document block and attaches the policy to the user.

aws_iam_access_key

Creates an access/secret key pair to be entered into Arq for authentication purposes.

Arq Configuration

Open up Arq and click File -> New Storage Location -> Amazon Web Services. Enter in your access key ID and secret access key that you generated earlier. Next, choose the bucket you created.

Now, click File -> New Backup Plan -> <your S3 bucket> and enter in an encryption password. Afterward, you will be prompted to choose the class for your storage objects. Here you should choose Glacier Deep Archive.

Next, edit your backup plan. We want to retain everything for at least 180 days, because with Glacier Deep Archive you pay for a minimum of 180 days for any object you store. There is no reason to ever delete an object before 180 days. Customize your backup retentions as such:

backup retention

Summary

That’s it! Arq automatically puts the index files in standard storage, and the archive files in Glacier Deep Archive. Enjoy your cheap backup.

A couple reminders: Don’t forget to watch your bill and look at AWS Cost Explorer to understand costs as you incur them. You will see higher costs the first month because you will be uploading a lot of data to start with. And, restoration can be very costly. Hopefully the cost reduction of Glacier Deep Archive pays off for you.