Source code – https://github.com/ehelin/StorageExperiments (get the commit closest to the date of this blog)
This blog entry will deal with the Amazon Dynamo Db. This is a no-SQL database and seems to be all the rage with developers these days. While I personally feel the relational model still has a place, no-SQL databases do as well. They seem to be a cheaper and more straight forward way of storing data. However, getting data out can be a challenge. I haven’t explored yet whether indexes are practical in very large no-SQL databases, but I believe they do exist. In discussions with other software developers, it seems like a hybrid mix of a relational and no-SQL database is appropriate. The no-SQL database stores the actual files and and relational database stores meta data about those files. More specifically, the relational database stores the name of the loaded file, its location and perhaps a size or other useful bit of information about the file that someone might want to quickly view. Then, bulk queries are run on the relational database and specific files are obtained from the no-SQL database. Keeping the two in sync can be a challenge though 😉
The thing that I am trying to remember is that there are almost an infinite number of ways of combining system components and the cloud is no different. I am trying to stay flexible on my ideas of what ‘system components’ are and how they are used. From what I can see, this is only going to be come more complicated in the future 🙂
Before I give the data load statistics, I noticed a couple of things. First, getting an accurate count quickly seems almost impossible from Dynamo Db directory at load time. The ‘options’ seem to be using the ‘Amazon.DynamoDBv2.DocumentModel.Search’ class with a search parameter that will return all values and just counting them as you iterate through the list. This takes a while 🙂 I did notice that the using an empty search object returns an inaccurate count. I would have thought it would have returned all records like a ‘*’ in T-SQL.
The other method for obtaining a count is the summary page provided in the Amazon Cloud Console. It looks like it is updated every 6 hours (ish). The Amazon cloud console has a ton of views that should show metrics, but it doesn’t appear to be working in the case of Dynamo Db for real time counts. It is probably something silly I am doing wrong.
I also had a ‘Resource not found’ error occur. It seems to occur when you create your table and try to start using it right away. Waiting for a while seems to have fixed it (see Reference #3).
The second think I noticed is that while the some records were lost and it took almost a week to run, it did run. Dynamo Db appears to be the equivalent of the Azure Document Db. I had issues getting the Document Db to run and it had some implied limits out of the box that Dynamo Db doesn’t appear to have. I am running these experiments out of the box (ish) to compare and learn. From what I have seen, the Amazon Dynamo Db appears to be a robust no-SQL database and superior (for the moment) to the Azure equivalent.
Amazon Dynamo Db Satellite Update Data Load (Seven Days (ish))
Start – 5/21/2016 (lost the time)
End – 5/27/2016 7:00:34 AM
23309989 Total records (lost 155 records)
Specific Record Query (2 hours and 24 minutes (ish))
start – 5/27/2016 4:21:56 PM
end – 5/27/2016 10:16:02 PM
Record Found – True
Total of Type ‘SouthEast’ – 2940180 (lost 19 records)