Think about your backup plan. Do you have one? Ideally you should be following something similar to the 3-2-1 backup strategy of having 3 copies of data: two local and one offsite. I personally follow a modified strategy of 2 local and 2 offsite. One offsite is on a server that I own in another state, and the other is stored in Amazon S3.
For my backups, I use Arq Backup. The strategy described below will work with other solutions too, but note that my experience is solely with Arq.
Loads of storage providers claim to be the cheapest. Backblaze B2 offers storage for $0.005 per GB/Month. Wasabi similarly offers pricing at $0.0059 per GB/Month (minimum 1TB @ $5.99). Comparatively, Amazon S3 looks ridiculously pricey at $0.023 per GB/Month. That’s over 4x as expensive as the other options!
Let’s look at the cost for 1TB of data per month:
|Offering||Price Per GB||Total Price|
But wait, there’s another option. In comes S3 Glacier Deep Archive (S3 GDA) to the rescue! Glacier Deep Archive is a cheaper tier of S3 that is designed for the same durability as regular S3, but at a much lower price.
Let’s look again, this time considering S3 GDA:
|Offering||Price Per GB||Total Price|
|Amazon S3 Glacier Deep Archive||$0.00099||$0.99|
99c per month?!? Too good to be true? A little, yeah. There’s a few key disadvantages:
- You pay for putting the data into S3 GDA
- You pay a little overhead for each object (file) in S3 GDA
- You pay to retrieve your data in case of emergency
- You can’t get your data immediately (up to 12 hours to retrieve)
Calculating the Cost
In AWS, cost is one of the hardest things to factor. You pay for everything from storage to network traffic to API calls. It’s usually easy to get a rough estimate of how much you’ll spend on AWS, but rarely easy to nail it down precisely. That’s why, when exploring using Glacier Deep Archive, I did some rough calculations on API cost plus storage cost, and then just did it.
All costs pulled 2/16/21 from https://aws.amazon.com/s3/pricing
Let’s break down what you pay for, using my actual usage of 487GB as an example.
487GB will cost you:
- $4.90 one time to get into AWS
- $0.50 per month for storage
- $0.20 additional per month if you do incremental backups once daily
- $62.28 if you ever have to retrieve and download it
Glacier Deep Archive charges
$0.00099/GB/Month. 487GB is
487 * 0.00099 = $0.48213 for 30 days, right? No! See the
** after the pricing on the page?
For each object that is stored in S3 Glacier or S3 Glacier Deep Archive, Amazon S3 adds 40 KB of chargeable overhead for metadata, with 8KB charged at S3 Standard rates and 32 KB charged at S3 Glacier or S3 Deep Archive rates.
(Source: S3 Pricing Page)
How big is each object? How many objects will I have? Who knows. I ignored that part of the calculation. Lucky for you, I’ve implemented this, so I have that number.
Not all your data goes into Glacier Deep Archive. Arq stores metadata about your archive, so that it doesn’t ever have to read the files from Glacier Deep Archive (which takes up to 12 hours and costs money). You’ll see some Standard storage below as well.
|Storage Amount||Tier||Tier Cost (GB)||Total Cost|
|487GB||Glacier Deep Archive||$0.00099||$0.48213|
|2.9GB||Overhead - GDA||$0.00099||$0.002871|
|724MB||Overhead - Standard||$0.023||$0.01656|
$0.501561 per month, or 50c. Pretty cheap for storing almost half a terabyte. But wait! There’s more.
Almost all AWS APIs have a cost to them. Glacier Deep Archive has one of the steeper costs of all the APIs. It costs
$0.05 per 1,000 object PUT/LIST requests. S3 Standard costs
$0.005 per 1,000 requests. That’s a 10x higher cost for GDA than for Standard.
It cost me about
$4.40, per inspection of my bill. The API cost is both a one-time cost and an ongoing cost, because you are doing incremental backups daily. In my case, I pay about
$0.20 per month for incremental backups. My backup plan is scheduled to run only once a day.
That brings monthly cost to ~
$0.70 for ~500GB and total one-time cost to ~
Early Delete and Retrieval
Two “gotchas” that you have to watch out for with Glacier Deep Archive are early deletes and retrieval. If you aren’t careful, you can rack up a large bill.
There is a minimum duration of 180 days for each object that enters into Glacier Deep Archive. Even if you store an object for a few seconds, and delete it, you’ll still be charged for 180 days worth of storage. Thus, you should be very careful to only put data into GDA that is going to remain stable.
Retrieval is also costly. While I haven’t yet retrieved my data set, I can get a rough calculation on the cost of it. If we take the number of PUT requests I made and assume that I would make an equal number of GET requests to retrieve it, then we can understand how much it will cost to retrieve.
- Total put requests:
$4.40 / 0.05 * 1000 = 88,000(5c per thousand)
- GET object cost
(88,000 / 1000 * 0.10) + (487 * .02) = $18.54(10c per thousand plus 2c per GB)
- Plus network data transfer from AWS to my machine
(487 - 1) * 0.09 = $43.74(9c per GB except first GB is free)
Total restore cost?
$43.74 + $18.54 = $62.28
When S3 Standard is compared to other services like Backblaze or Wasabi, it is always shown to be more expensive. Certainly, others have simpler cost models: flat costing for storage and retrieval. However, the flexibility of management that AWS offers is unparalleled. The total cost of Amazon S3 Glacier Deep Archive, so long as you don’t have to restore, is almost always going to be far less than that of either Backblaze or Wasabi. At the same time, you gain access to the vast ecosystem of resources that AWS has to offer.
In the next article, I’ll discuss how to actually use Arq Backup with S3 Glacier Deep Archive. I’ll demonstrate the use HashiCorp Terraform to create everything using “code.”