Source code – https://github.com/ehelin/StorageExperiments (get the commit closest to the date of this blog)
This blog post starts my adventure with the Amazon cloud with Dynamo Db and S3. I picked them since they are the ones I have heard the most about. The Dynamo Db experiment is still running, so I will have another blog post in a couple of days on it. S3 appears to the equivalent of Azure blob storage. The details are below, but S3 did not perform as well as Azure blob in my experience. However and in general, I am finding the Amazon cloud user interface easier to deal with than Azure user interface..
Prior to starting my Azure experiments, I had already been doing work with it for my job. Other than a few practice applications, I have not spent very much time in the Amazon cloud. So, when I first logged in, the Amazon user interface was a little intimidating. But once I looked around a bit, I found it very intuitive. Also, using the Amazon site documentation (which has examples in a lot of languages) and a few other sites, I was able to add code rather quickly to the existing Storage Experiment solution (link above) and start the load. One unexpected factor was a $150 dollar bill. My wife wasn’t happy, but I learned a lot 🙂
I did notice a couple of things. Deleting a non-empty bucket looked like it was going to be an issue. Usually, I like to delete my container or table and start over. Deleting 23 million records which is the size of my generated data set takes to long. However, when I did this, I got an exception stating it was not possible to delete a bucket with data it it. However, after a google search, I found out that you can delete a non-empty bucket using the AmazonS3Util class (see reference #6).
Also, getting a total record count looked like it was going be difficult for two reasons. I ran a code count and I kept coming up with a 1000 records. Turns out that Amazon uses pagination to manage the size of results. I based my solution on a post from stackoverflow.com by looking for the next marker until there were no more (see reference #7). It seems to be accurate. Also, it is always nice when you can get counts quickly without running a program. I found this in a report you can download from the Amazon Management Console. From what I can see, it appears to be fairly accurate.
To access it:
-Login to amazon cloud console
-Click on ‘Billing & Cost Management’ under your name
-Click ‘AWS Usage Report’
-Select ‘Amazon Simple Storage Service’ and accept all defaults but ‘Report Granularity’…select ‘Months’
-Download report (I did .xml)
-Fine the ‘PutObject’ entry that is not a byte measurement and you should be able to get a count…mine seemed accurate…though there does seem to be some latency as a download 15 minutes later showed the same total.
Amazon S3 Satellite Update Data Load (33 hours and 2 minutes (ish))
start – 5/24/2016 – 12:10:44 PM
end – 5/25/2016 – 9:12:09 PM
23310144 Total records
Specific Record Query (2 hours and 24 minutes (ish))
start – 5/26/2016 4:14:05 PM
end – 5/26/2016 6:38:45 PM
Record Found – True
Total of Type ‘SouthEast’ – 2940199