This video presents the different quorum modes used by Exchange Server 2016 to recover a lost database copy or to update one that was temporarily down.
- [Instructor] In chapter two, we talked about high availability which usually refers to minimizing or, when possible, eliminating down time. The next level of fault tolerance is resiliency or the ability to quickly bounce back when a failure was unavoidable. One of the underlying concepts of resiliency in database availability groups is the concept of quorum. The word quorum simply refers to a majority of the group members. Specifically in a database availability group we're talking about a majority of the exchange servers running databases being available, online, and functioning.
Now it's probably more accurate in exchange to refer to the databases than the servers that are running them. When we're looking for quorum, we're looking for a majority of the databases in an availability group to be online and functioning. This means that if we had, say five member servers, members meaning an exchange server with a replica of the database, if we had five of them, three of them would need to be online and functioning to allow us to continue.
This idea of quorum is important to exchange for a couple of reasons. Top of the list is consistency of all databases in the DAG. Every database loads and stores cluster hive information. Now if a network failure or any other failure gets in the way of loading or maintaining one local copy the clustering on that server stops. If the clustering service stops, then clients won't be sent to that server to send or retrieve email, or to have any other kind of communication.
The second reason for a quorum is better performance, faster response time. Let's say I have five exchange servers in a DAG and let's say those servers are spread around the country. We probably spread them out to make sure users could access one nearer to them. The cluster hive would use each server's current status to oversee the database replica on that server. If a client accesses the DAG IP address their request can be forwarded to the server that's already verified to be current and near them.
This saves the database servers from having to confirm information from each individual request. Now both of these benefits work great when there's an odd number of exchange servers in the DAG. Then the DAG is able to use something known as node majority mode. As the nodes, or exchange servers, are the only votes in the process. In a quorum of three exchange servers, one server could go down for whatever reason and there would still be a majority, there would still be a quorum to work things out.
Another quorum mode is used when you have an even number of servers. I've mentioned that quorum relies on having a majority. The third function of a quorum is provided by the witness server. You'll remember from the last chapter that a witness file server doesn't have a copy of the database but it does use a log file in a file share to keep track of which database in the DAG is the most current. The exchange server with the most current database puts an SMB lock on that shared log file and by so doing, secures an extra vote for itself for any quorum purposes.
This quorum mode is known as node and file share mode. The file share counts as one vote making sure that it's always possible to have a majority. Now the names of these modes are not too important to memorize because you'll never have to select one or the other. You have to have a witness server configured either way. You'll remember that from when we created the DAG. But you don't have to select a node. Exchange will automatically shift between modes depending on the number of members in the group.
If more than half of the members in a DAG are not available, a quorum is not available and the entire cluster goes offline until you find and fix the problem. This means that you can never update more than half of your exchange servers at the same time. It also means that if half or more of your exchange servers are in the same location and that locations loses connectivity, your entire cluster is down until you fix the problem. Once you've restored quorum by making sure that at least a majority of the database replicas are online, the cluster can resume.
- Planning and managing database storage
- Setting database requirements
- Storage architecture
- Virtualization scenarios
- Creating and managing mailbox databases
- Failure domains and SLA requirements
- Proper placement of a file share witness
- Site resilient DAG
- Troubleshooting database replication, performance, and database failure
- Planning for SLA recovery requirements