22 January 2023

AWS S3 API call using babbel to check incomplete multi part uploads

Sometimes AWS is hard work!

For a long time I've had some photos backed up in an S3 Glacier bucket. Glacier is just long term storage, very cheap and slow to access. Meaning you shouldn't really rely on getting regular access to these files. In my case they're for worst case scenarios use only.

The problem is every month my bill looks a bit strange. I have a Glacier entry and I also have an S3 entry on my bill. We aren't talking much money here, just a few pennies but it bothers me, there shouldn't really be anything on S3, it should all be under Glacier. So I gave every bucket a tag so I could see which bucket was the culprit. Tags are really useful because I can group by tag spending on the AWS cost explorer. So this should identify what's going on. Unfortunately no such luck, the S3 spending is always under "No tag key". After what was probably longer than I should admit curiosity got the better of me and I emailed support. Support came back to me saying this is probably multi part uploads that have failed.

The plot thickens! I can't confirm this is indeed the cause yet, but I learnt a few things along the way, so here are some general AWS related findings.

Incomplete Multi Part Uploads.

There's been a fair amount written about this, and if you're reading this blog you probably get it already so I'll be short. When you try to upload a large file it can't be sent in one chunk across the internet. So we break it down and send it bit by bit. So what support is implying is that I started uploading files and some parts succeeded and some failed. So that left incomplete "chunks" or packets probably more accurately on my S3 account. 

AWS's solution

They have written a blog post about this, but it's not very up to date and personally I couldn't get Storage Lens to show incomplete multi part uploads.

https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/

Listing incomplete multi part uploads

It's absurd that this is not simple from the console. However I really wanted to see if I did in fact have any incomplete MPU's before I applied a rule that deleted them. So I set about dusting off my old REST API skills to see if I could create a script that would list incomplete MPUs. I discovered I could use an API method for this:

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListMultipartUploads.html

I really wanted to use Kotlin for this but I wasn't setup to run Kotlin stand alone apps, so it was faster for me to make this an Android "app". I used an awesome tool called babbel okhttp-aws-signer to sort out the AWS Signature Version 4 stuff as I know how frustrating that can be:

https://github.com/babbel/okhttp-aws-signer

This library is fantastic. So much easier than trying to workout the AWS Signature Version 4 nonsense yourself.

Here is the script I used to generate and make an AWS S3 REST API request:



val url = "http://" + bucketname + "." + serviceName + "." + region + ".amazonaws.com/?uploads=1"
val dateused = SimpleDateFormat("yyyyMMdd'T'HHmmss'Z'", Locale.US).format(Date())

val signer = OkHttpAwsV4Signer(region, serviceName)

val request = Request.Builder()
    .url(url)
    .build()

val newRequest = request.newBuilder()
    .addHeader("host", request.url.host)
    .addHeader("x-amz-date", dateused)
    .addHeader("x-amz-content-sha256", "".hash())
    .build()

val signed = signer.sign(newRequest, accessKeyId, secretAccessKey)


You'll note that I had to use ?uploads=1 to get the babel library to work. This is because it requires query params to be pairs and you get a NPE if you don't do so. You'll also notice I had to add the new (to me) header x-amz-content-sha256 which apparently is just an empty hash if the body is empty. This is a GET request so of course it is.

You'll need your access key and your access key secret which you can generate in the security credentials area of your account.

So after all that and quite a lot of work I was able to prove that there were in fact a few incomplete multi part uploads on my S3 account. So I've created a policy to delete them, let's see if that works.

I really recommend babbel okhttp-aws-signer as it makes things like this quite simple and helps you answer a few AWS questions without too much drama.

Hope this has helped you with something AWS related!












No comments: