Destroying Sensitive Information Stored in AWS with GNU Shred and Python

Volume Security in AWS

AWS is a tremendous resource that makes standing up and shutting down complex infrastructure easy. At work and at home I use AWS on an almost daily basis. I like that I can provision and terminate resources without giving a second thought to the location or content of the physical machines, and that I can do so inside of 60 seconds for just about any action.

In the past I've never had to consider what happens to something like an EBS volume when its returned to the Amazon resource pool, because frankly who cares? It doesn't bother me that someone at Amazon might be able to see the content of my volume because at the end of the day all they are going to find is some code. But what happens if our volumes do in fact contain confidential information or even worse, confidential information that is worth something to someone like a hacker?

In those instances it is reasonable to protect the content of the volumes by encrypting them (a feature that Amazon offers free of charge). In which case you can simply destroy the keys encrypting the drive and rest easy that no one, at Amazon, or anywhere else is going to be able to snoop on the contents of the volume once its returned to the resource pool.

However, in the real world, things have a tendency to be a little more screwy. Consider that someone other than yourself could have spun up an environment at your company that was never intended to contain any kind of sensitive/confidential information. In this case it is completely reasonable to have setup unencrypted volumes, if for no reason other than to have one less key to manage. Now let's say that at some point, someone places customer data on those volumes for demo purposes. This shouldn't have happened, but let's say it did anyway. Some questions you might then ask are:

  • What happens to the data when it is released to Amazon?
  • Is your data secure?
  • Do you have to worry that your data might be persisted and another customer down the line might have access to it?

As it turns out the answers to these questions can be a little tough to come by. After quite a bit of searching, I found this AWS developer forum thread detailing different compliance ratings that Amazon holds and how it deals with your data during its tenancy in AWS. Most of the information in this thread seems to indicate that data is wiped according to DoD standards and that you have absolutely nothing to worry about when releasing volumes containing sensitive information. However, this is misleading and is not necessarily the case. If you read the very final post in the thread you will see that page 21 of the AWS security whitepaper reads:

"Amazon EBS volumes are presented to you as raw unformatted block devices that have been wiped prior to being made available for use. Wiping occurs immediately before reuse so that you can be assured that the wipe process completed. If you have procedures requiring that all data be wiped via a specific method, such as those detailed in DoD 5220.22-M ("National Industrial Security Program Operating Manual") or NIST 800-88 ("Guidelines for Media Sanitization"), you have the ability to do so on Amazon EBS. You should conduct a specialized wipe procedure prior to deleting the volume for compliance with your established requirements."

This indicates that the responsibility of deleting data on EBS volumes to DoD or NIST standards is in fact the customer's responsibility and is not inherently performed by AWS. This statement is also mirrored in AWS' shared responsibility model documentation which indicates that customers are responsible for security in the cloud and AWS is responsible for security of the cloud. So, if you don't ensure drives are properly wiped before releasing them back into the AWS resource pool the data is still hanging around within the cloud and is not deleted until the space is needed for another customer. This makes your sensitive data susceptible to exfiltration by Amazon contractors or employees (and although I'm sure they're all honest, you never can be too sure).

So in conclusion, we must securely wipe all data from EBS volumes before releasing them back to Amazon if we want to ensure that our potentially sensitive data doesn't fall into the wrong hands.

Introducing The GNU Shred Utility

The GNU shred utility makes it possible to permanently erase all data on a persistent memory device like a hard drive or solid state drive. This is achieved by overwriting the contents of the volume with random data in several passes and then finally zeroing out everything in a final pass to produce a raw, unformatted, volume free of any sensitive/confidential information. shred is part of coreutils so finding quality documentation should be as easy a man shred. However, there are a number of good blog posts about how to use shred and how it works under the hood if you are inclined to find more information. Let's see how we can use shred to wipe an EBS volume so that we can safely release it back to the AWS resource pool.

Enter python + boto

In order to use shred we have to take care of a few things first. First and foremost it is not possible to reliably destroy an EBS volume that is currently attached to an EC2 instance. Instead when we stop/terminate the EC2 instance we must ensure that the EBS volume attached to it is not destroyed by default (otherwise the EBS volume is released to AWS before it has been shredded). Once the EC2 instance is stopped/terminated the EBS volume can/will be unattached and is ready to be shredded.

Because we can not shred a drive that we are booting from we will need another auxiliary EC2 instance to actually perform the shred. It is also mandatory that the new EC2 instance is setup within the same availability zone as the EBS volume to be shredded (this way Amazon avoids shipping data around and using up its internal bandwidth).

Once we have configured and launched a new EC2 instance we are ready to start shredding some drives. To establish some basic understanding of the process lets look at how we can perform this task manually, then we will automate the process using python and boto which is much more convenient for shredding multiple EBS volumes. To get started shredding we will need to:

  • Attach the EBS volume we want to erase to the new EC2 instance we have setup for shredding
  • Locate that attached volume in /dev/, but do not mount it
  • Run shred on the attached EBS volume
  • Detach the drive from EC2 instance
  • Release the EBS volume back to Amazon's resource pool

So, let's go ahead and attach the EBS volume we want to shred to the new EC2 instance we've setup. First, go to the AWS management console and copy down the "instance id" field (it should have the form "i-abcd1234"). Next select "volumes" from the "Elastic Block Store" section of the navigation menu on the left side of the window and locate the volume you intend to shred. Right click the volume and select "Attach volume." A dialog box will appear, prompting you to enter the instance id of the machine you want to attach the drive to - enter the instance id that you copied down earlier. This prompt will also ask you for "device." This field is the name you want the drive to have when it appears in /dev/. You can use whatever you want, but I normally use /dev/sdx. This particular device name will be converted to /dev/xvdx on the actual EC2 instance (I couldn't tell you why). So now that the drive is attached, ssh into the newly created EC2 instance and run ls -l /dev/ and confirm that xvdx or whatever you named your device is present in the list. To make your life a bit easier you can run ls -l /dev/ | grep xvdx which will search specifically for xvdx instead of having to manually look through the list.

Once you have confirmed that the device is attached, we can go ahead and start shredding. You do not need to mount the drive to shred it. To shred the drive to DoD standards simply run the command:

sudo shred /dev/xvdx -f -v -z

This will shred the device /dev/xvdx with three passes (default) of random data plus one final pass of zeros to finish it out. 3 is the standard number of iterations that shred will perform, but you can specify any number you want using the -n flag. So sudo shred /dev/xvdx -n 7 would write 7 passes of random data to the drive.