Introduction to Elasticsearch

Elasticsearch is an open source, distributed, scalable, highly available, document-oriented, RESTful, full text search engine with realtime search and analytics capabilities. It is basically built on top of the Apache Lucene for the indexing purposes. In this post, I am going to give you a very quick conceptual introduction to Elasticsearch.

Elasticsearch is in form of Cluster(s) of nodes. A node can get added/removed to the clusters. Nodes hold data as Shards. Shards are partitions of the whole data and are configurable via REST API’s. Shards make information retrieval faster and more efficient due querying smaller portion of the original data.

In order to ensure having high availability, you can select number of replicas of your data. By providing number of replicas higher than 1, Elasticsearch distributes redundant copies of your data over the cluster on different nodes excluding the main shard. This means that if, at any point in time, one of the nodes fails, there are other copies of the requested data somewhere across the cluster and the whole setup will be resilient to probable failures.

Nodes and replicated Shards

Nodes and replicated Shards

 

Elasticsearch hits the necessary shards in case of search queries thanks to the Sharding mechanism which introduces higher throughput, and less network overhead. Apache Lucene performs the big part of the indexing procedure, and a single Shard is a Lucene index.

Elasticsearch is using the inverted index technique for indexing the input data and consequently performing the search queries on the data sets.

If you want to have Elasticsearch as efficient as possible, you need to have certain hardware profile as well. In first place, having a very high speed network is a necessity; because if either a new node is added or a node crashes, there will be considerable amount of data relocation going on. In addition, if your application is aggregation oriented, the sufficient amount of memory can be really beneficial (the more, the better). Having a powerful processing unit (CPU) is also an asset during the computation of the indexes as Elasticsearch adjusts its thread pools to the existing resources.

In next post, we will briefly review some of the basics of interaction with Elasticsearch using its REST api and curl application.

Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Leave a Reply

Your email address will not be published. Required fields are marked *