Understand that the superficial similarities between relational databases can mask important implementation differences between Cassandra and relational databases.
- [Instructor] Relational databases are the most commonly used type of database. They work well with applications designed for a wide range of uses, from business operations to science research. Many web applications use popular relational databases, like MySQL and PostgreSQL. Although relational databases work well in many cases, some applications have requirements that are difficult to meet with relational databases. For example, some applications must write large volumes of data quickly, while other applications require extremely fast query response times.
In addition to this, some applications need to have high availability. Relational databases can achieve high read and write rates, but some of the features of relational databases make this difficult. For example, relational databases ensure that read and write operations are consistent, so that all users see the latest correct version of data. Keeping track of multiple users, reading and writing data, and ensuring that data is consistent adds overhead to relational database operations.
Now, if you're willing to work with a database that doesn't guarantee the same level of consistency, you can have significantly faster database operations. NoSQL databases are non-relational databases. As the name implies, they do not use SQL for defining and manipulating data, but the differences go deeper than that. Let's look at Cassandra, a wide-column, NoSQL database. Cassandra has many similarities to relational databases. Both use tables as a basic data structure.
Tables are made up of columns that store attributes. Cassandra uses data types that would be familiar to a relational database developer, such as Integer, Fair Care, and Date. Cassandra tables have primary keys, which uniquely identify rows in a table. Primary keys are used to access data in Cassandra. Primary keys are not enough to find rows in Cassandra, because Cassandra is designed to run on a cluster of servers.
There is no single server in a highly-available Cassandra database. Now, we can run Cassandra on a single machine, and developers do this regularly. But production databases are best run on clusters of multiple servers. To enable fast access to rows within tables that span multiple servers, Cassandra tables use two additional kinds of keys. A partition key is used to determine which node in the cluster to store a row in. A clustering column defines the order in which rows are stored.
Also, it is worth noting that servers in a Cassandra cluster are called nodes. We'll follow that convention in this course. Cassandra has a query language, called CQL, which stands for Cassandra Query Language. It is similar to SQL, but has more restrictions, as we will see. It uses a select statement for querying. To retrieve all columns about an employee with an employee ID equal to 8928, we could use a statement such as this. SELECT star, from the Employees table, where the employee_id is equal to 8928.
Cassandra has other commands similar to SQL commands, like these that are used to define data structures, like tables and indexes, and to modify data, like the update command. Cassandra does not have a fixed schema. Some rows may have different columns than other rows. For example, in the employee table, managers may have a column called number of employees managed, while non-managers do not have that column.
Another big difference with relational databases is that Cassandra is an eventually consistent database. That means there may be times, usually quite brief periods of time, where replicas of a row have different versions of the data. This is because Cassandra keeps multiple copies of data on different nodes. In case a node fails, users can still get their data from a replica on another node. Even for databases designed for fast operations, it can take some period of time before all copies are updated.
In that case, a user might read an old version of data. This difference in the copies of data is known as an inconsistency. Eventually, the inconsistency will be corrected. Throughout this course, we will see examples of Cassandra features that sound similar to relational database features, but in fact, are implemented differently. It is important to keep these differences in mind, because they influence how we built data models for Cassandra.
- Cassandra architecture
- Keyspaces, tables, and columns
- Installing Java and Cassandra
- CQL data types
- Designing Cassandra tables
- Tuning tables to optimize queries
- When to use secondary indexes and materialized views
- Physical data modeling and distributing data
- Cassandra architecture and its impact on data modeling