Learn how to differentiate between NoSQL storage in DynamoDB and object storage in S3, and identify the key configuration options of Dynamo.
- [Instructor] DynamoDB is AWS's highly available, scalable, no SQL database. You might compare it to other no SQL databases like MongoDB. The main advantage of Dynamo is that, for RWS for relational databases, Dynamo is a fully managed service. It scales performance on demand, meaning that you can provision and pay for just the level of read and write performance you desire. It's also a distributed storage system meaning it's durability and availability is very high. Like S three, it can publish event streams for all it's activity.
Finally, it's stored on solid-state drives, giving it very low latency, at a price. Let's look at how it works. The primary artifact of DynamoDB is the table. You can almost think of this as the equivalent of a bucket in S three. You can have as many as you want. Within a table you'll create records which AWS calls items. Items can not exceed 400K in size. Items can be one of a few data types. Scalars are single value types such as strings or booleans. Multi-values, think arrays or sets.
Documents are complex types. They combine the above two and can involve nested data structures. Using these types, you end up with items that look an awful lot like JSON. In this example, first and last our scalar types a string. Owner of is a string set, more commonly called an array of strings. Finally, the whole thing is a map object. A conglomeration of the other data types into a structure that contains many key value pairs as well as sub structures like the nested object, spouse. This kind of data model means that Dynamo is an ideal choice for non-relational data.
If you need to frequently join records, you may want to look at putting a relational database on RWS or EC two, but if your data load is high volume, non-relational, and has individual items under 400K, Dynamo is going to work very well. Now let's discuss primary keys. The no SQL databases are schema less. Each Dynamo table you create should contain the same kind of data. The exact fields on the items within may vary. Some fields may be option for instance, but there should be some consistency to what's inside.
Given that when you create a DynamoDB table, the first thing you'll need to do, besides give it a name, is select a primary key. Think about the format of the data you want to put in. In the previous example, first, last, and owner of, are all examples of keys. We want to select a key that will uniquely identify a record, meaning it should be present for all items. They should be diverse and evenly distributed enough to good for physically partitioning data. And primary keys should serve as effective keys for querying data, since they'll be your primary mechanism for doing so.
In some cases you might have an actual ID or GUID, such as an employee ID, to serve as a primary. Choose well because after all this is no SQL, which implies you won't be querying with the full power of a query language like SQL. You'll be relying on your choice of keys and indexes. I mentioned partitioning. One important thing to understand is that the primary key is used by DynamoDB not just to help you locate your records, but also to physically partition your data across nodes. This can effect your overall throughput. You want to avoid clustering so don't choose a key where for instance, half the records start with A and the rest are evenly distributed.
Choose something that will be logically spread out. When it comes to querying, you have just a few simple choices. You can query on the primary key, You can query on a sort key, which is like a secondary key. You can create custom indexes on which to query. These can be comprised of any other fields in an object, but they do take up space. They also often force you into a two part query where first you query the index, obtain a primary key, and then use that to query for the object you really want. Finally, you can perform a full table scan, retrieving all records and filtering the results, which can be a slow process.
You really want to choose your keys and indexes well so that you can minimize the results that you must filter. So what is a sort key? Sort keys combined with primary keys to act like a composite key. They extend your namespace, allowing the primary key to be repeated. In a trivial example, if primary was first name, having last name as a secondary key lets you store both John Smith and Sally Smith. Otherwise, you get a primary key conflict error. Sort keys don't effect partitioning, so keep that in mind.
Primary keys should remain well distributed, however, sort keys do effect the order in which records will be stored on disks, which can have implications for query speeds. That's why they're sometimes called range keys. Indexes in Dynamo, allow searching on a field other than the primary key. Any field that is not a primary key, sort key, or index cannot be directly searched. Indexes become their own records. For instance, if I created an index on employee start year, Dynamo would actually create and store new records with that data in it, along with any related fields I specified to aid in searching.
Then we essentially search on the index, as if the index value was a primary key. So if we indexed start year and we include employee id, we have a simple two step process to drill down to employee records by start year. Search start year from the index, grab the employee id that is part of each index record, and query the main table on that employee id. Finally, even though the are physically manifested as records, Dynamo indexes are maintained and updated automatically by the Dynamo service. Although AWS does not advertise service level numbers for Dynamo the same way they do for say S three, Dynamo is designed for high durability and availability.
All your data is synchronously replicated to three different locations within your region. You also have the option to enable cross region replication which asynchronously replicates your Dynamo tables to another region. AWS handles this automatically so if there's a problem in your primary region, you need only switch your endpoint to the other region, and your connected applications can fail over to the backup. Note that there's an additional charge for enabling this feature. Dynamo can support an extremely high level of read and write requests, and like many AWS services, this capacity can be provisioned in an elastic way.
There are three options for telling AWS how much request throughput to give you. First, you can scale capacity on demand, by issuing a command from the CLI or web console. Second, you can reserve capacity, which operates a bit like reserved instance pricing. You can pay up front for a one or three year term. This commitment nets you a lower overall price but it requires good forecasting on your part. A good rule of thumb with AWS is to never reserve anything more than one year out, because AWS drops their prices so often.
Finally, you can enable auto-scaling where AWS will update your capacity in response to cloud watch events. You can load data into Dynamo in numerous ways. There's always the web console, and the command line interface or CLI, which can do anything the web console can. There's also the REST API, and finally the SDK option. Like many AWS services, Dynamo can integrate with numerous programming languages like Java or Python via SDK. This is what you'd use to build applications that depend on Dynamo for data.
A powerful feature of DynamoDB is the streams capability. Streams publish events whenever something changes in a table. In a relational database this kind of functionality could be created by applying triggers to tables and publishing to a queue like ActiveMQ or AWS's SQS. In Dynamo, the feature is native, out of the box. Whats more, you don't get notified that events occurred, instead you get valuable state information about the object that was updated. For inserts, you get the entire new object.
For updates, a before and after object so you can see exactly what changed. For deletes, you get the object that was just deleted. This opens up tremendous possibilities because these streams can be fed into Lambda functions. Lambda is AWS's server-less compute service, essentially a way for you to write functions in languages like Python or Node to be executed as needed. Lambda functions can be triggered by many events in AWS including Dynamo streams. When this occurs, the Lambda will receive the incoming object JSON as a parameter.
As a result there are many, many options for processing DynamoDB streams in Lambda. You could replicate data into another Dynamo table, or into S three. You could go to SNS to generate emails or text to let customers know of a change, say a price drop on an item, or maybe it's an event your employees need to know, like inventory has dropped too low on a certain item. You could add each event into an SQS queue, thus enabling the propagation of that real-time message to any number of downstream systems. After all, SQS is compliant with Java Messaging Service and any JMS client for any number of programming languages could read and react.
So now that we know some of what DynamoDB can do, let's create a table and see some of these capabilities in action.
Join AWS architect Brandon Rich and learn how to configure object storage solutions and lifecycle management in Simple Storage Service (S3), a web service offered by AWS, and migrate, back up, and replicate relational data in RDS. Find out how to leverage flexible network storage with Elastic File System (EFS), and use the new AWS Glue service to move and transform data. Plus, learn how Snowball can help you transfer truckloads of data in and out of the cloud.
- What is data management?
- AWS S3 basics
- S3 bucket creation
- S3 upload and logging
- S3 event notifications
- S3 data lifecycle configuration
- Working with Amazon Elastic Block Store volumes
- Creating and mounting an EFS
- Creating an AWS RDS instance
- RDS backup and recovery
- Moving data with AWS Database Migration Service
- Moving data with Data Pipeline and Glue