Take a look at the basic concepts of Elasticsearch.
- [Narrator] Section two. Elasticsearch - full-text search and analytic engine. In this section, we are going to take a look at the basic concepts of Elasticsearch, how to use Elasticsearch APIs followed by how to do aggregations using Elasticsearch. Then, we will proceed with Query DSL and mapping. After that, we will be seeing about Elasticsearch analyzers and finally, we will see how to use scripts in Elasticsearch.
Chapter one. Basic concepts of Elasticsearch. In this video, we are going to take a look at some of the Elasticsearch terminologies and we will understand the structure of Elasticsearch. Elasticsearch terminologies. The first term that we need to know is cluster. Next, we will be seeing about nodes. After that, we will see about index followed by type, document, shards and replica. Elasticsearch cluster.
A cluster is a collection or group of one or more Elasticsearch nodes that holds together the entire data. In short, a cluster is a single structure under which all the instances of Elasticsearch are connected to each other. In case if we had only one node under a cluster, the single node might not be able to handle the large trunks of data at some point of time when the data grows at the rapid speed or the node might not be available due to update, upgrade and other reasons.
This could cause a serious risk, so to avoid it, we can exploit the distributive nature of Elasticsearch which allows us to easily handle huge amount of data by using multi-node cluster. In multi-node cluster, even the failure of one or more node will not interrupt the work and this provides great stability to the overall system. Elasticsearch provides a seamless clustering experience which is the major advantage over its competitors.
As mentioned in the previous section, by default, the cluster name is Elasticsearch, but we can modify this in the Elasticsearch (mumbles) file before installation. The name of a cluster is an important parameter, because a node can only be a part of a cluster if the node is set to join the cluster by its name. Next, we will see about nodes. Elasticsearch nodes. A single instance of Elasticsearch running on a machine is called a node.
For simple use cases which involves less processing of non trivial data, a single node architecture could be sufficient, but if we want to have a system at high availability, then we need to prefer a multi-node structure. By default, Elasticsearch will take the first seven characters of the randomly generated UU ID as the node name. Next, we will see about the different types of nodes available in Elasticsearch. There are four types of nodes in Elasticsearch.
The first one is the master and master eligible node. The master node works as the master at the supervisor of all other nodes available under the same cluster. The master node is responsible for actions such as creating or deleting an index, tracking which nodes are part of the cluster and allocation shards to other nodes. We will get to know what a shard is shortly. Master eligible node. There is a property called node.master in the Elasticsearch (mumbles) file.
If this property is set to true which is the default value, it makes the node eligible to be elected as the master node. Let us assume we have a multi-node cluster with one master node. If in case, the server with master node fails, the nodes which are master eligible complete through the process called Master Election Process and a new master is selected over it. Next is the Data node. Data node hold the data and perform data-related operations such as the current operations, search and aggregations.
To make a node as a data node, the node.data property in the Elasticsearch (mumbles) file configuration should be set to true. By default, the value is set to true. Next, we have the Ingest node. Ingest node are used to preprocess documents before the actual indexing takes place. By default, the node property node.ingest is set to true. We will have a look at the indexing and preprocessing in the upcoming chapters hence, by default, a node is a master eligible node data and ingest node.
This configuration should be fine for a small cluster, but as the cluster grows, it is very important to consider dedicated master eligible nodes, data nodes and ingest nodes. Finally, we have the Tribe node. A Tribe node is a special type of node used for coordination purpose alone. This node can connect to multiple clusters and perform search and other operations across connected clusters. Shards and replica. Sometimes, the data volume we need to store exceeds the storage capacity of a single server.
Let's assume we have a server of one TB capacity and now, we need to load the data of all movies 'till date from all over the world. This could well exceed the one TB limit of the server and also, having the data in a single node will slow down the search capabilities of the server, as well. To overcome these problems, Elasticsearch came up with the concept of shards. In the next video, we will be seeing about Elasticsearch REST APIs.
This course was created and produced by Packt Publishing. We are honored to host this training in our library.
- Elasticsearch concepts
- Working with Logstach and Kibana
- Elasticsearch Query DSL
- Aggregation and analyzers
- Scripting in Elasticsearch
- Using plugins and APIs
- Building an interface with dashboards
- Filtering and processing input
- Loading data to Elasticsearch