Customers looking for product information from the businesses with huge product and client base are facing the issues such as a long time in product information retrieval. This leads to poor user experience and in turn missing the potential customer.
Lag in search is attributed to the relational database used for the product design, where the data is scattered among multiple tables and retrieval of meaningful user information require fetching the data from them. The Relational Database works comparatively slow when it comes to huge data and fetching search results through queries from the database. Businesses nowadays looking for alternate ways where the data stored in such a way that the retrieval is quick. This can be achieved by adopting NOSQL rather than RDBMS for storing data. Elasticsearch is one such NOSQL distributed database. Elasticsearch relies on flexible data models to build and update visitors profiles to meet the demanding workload and low latency required for real-time engagement.
Relational database works comparatively slow when it comes to huge data and fetching search results through queries from the database. (There are ways to optimize this like indexing but then there are related limitations like we can’t index every field. Row updates to heavily indexed tables would take time. People also scale their RDBMS vertically to improve performance.) This is a problem is overcome by Elasticsearch. Below figure shows how RDBMS ideally work for searching things from the Database.
Let’s understand what is so significant about Elasticsearch? ES (Elasticsearch) is a document-oriented database, designed to store, retrieve and manage document oriented or semi-structured data. When you use Elasticsearch you store data in JSON document form. Then you query them for retrieval. It is schema-less, using some defaults to index the data unless you provide mapping as per your need. Elasticsearch uses Lucene StandardAnalyzer for indexing for automatic type guessing and more precision.
Every feature of Elasticsearch is exposed as a REST API
- Index API – Used to document the Index
- Get API – Used to retrieve the document
- Search API – Used to submit your query and get the result
- Put Mapping API – Used to override default choices and define our own mapping
Elasticsearch has it’s own Query Domain Specific Language, where you specify the query in JSON format. You can also nest other queries based on your need. Real projects require search on different fields by applying some conditions, different weights, recent documents, values of some predefined fields and so on. All such complexity can be expressed through a single query. The query DSL is powerful and designed to handle the real world query complexity through a single query. Elasticsearch APIs are directly related to Lucene and it is using the same as Lucene operations name. Query DSL also using the Lucene TermQuery to execute it.
Below figure shows how the Elasticsearch query works.
Let’s have a look at the basic concepts of Elasticsearch
Cluster: A cluster is a collection of one or more servers that together hold entire data and gives federated indexing and search capabilities across all the servers. For Relational Database node is DB Instance. There can be N Nodes with the same Cluster Name.
NRT (Near Realtime): Elasticsearch is a near real time search platform. There is a slight from the time you index a document until the time it becomes searchable.
Index: Index is a collection of documents that have similar characteristics. For example, we can have an index for customer data and another one for a product information. An index is identified by a unique name that refers to the index when performing indexing search, update and delete operations. In a single cluster, we can define as many indexes as we want. Index = Database Schema in RDBMS (Relational Database Management system). Similar to a Database, or Schema. Consider it a set of tables with some logical grouping. In ElasticSearch terms, Index = Database, Type = Table, Document = Row.
Node: A single server that holds some data and participates on the cluster’s indexing and querying. A node can be configured to join a specific cluster by the particular cluster name. A single cluster can have as many nodes as we want. A node is simply one ElasticSearch instance. Consider this a running instance of MySQL. There is one MySQL instance running per machine on different a port. While in Elasticsearch generally, one elasticsearch instance runs per machine. Elasticsearch uses distributed computing so having separate machines would help as there would be more hardware resources.
Shards: A subset of Documents of an Index. An Index can be divided into many shards.
Mapping Type = Database Table in RDBMS
ElasticSearch uses document definitions that act as tables. If you PUT (“Index”) a document in ElasticSearch, you will notice that it automatically tries to determine the property types. This is like inserting a JSON blob in MySQL, and MySQL determining the number of columns and column types, as it creates the Database table.
More than understanding What is Elasticsearch? It’s important to know when to use Elasticsearch? Some of the use cases of Elasticsearch are:
- Textual Search (searching for pure text)- Elasticsearch is primarily used where there is lots of text and we want to search any data for the best match with a specific phrase.
- Product search by properties and name (Text search and structured data)
- Data Aggregation- The aggregation’s framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data. An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
- JSON document storage- A JSON object with some data. It’s the basic information unit in ES. The document is a basic information unit that can be indexed.
- Geo Search- Elasticsearch can be used to Geo-localize any product. For example: for all the restaurants that serve pizza within 30 minutes.
- Auto Suggest: It allows the user to start typing few characters and receive a list of suggested queries as they type.
- Auto Complete: It helps in autocompleting for the search by completing a search box on partially-typed words, based on the previous searches.
Elasticsearch users have delightfully diverse use cases, ranging from appending tiny log-line documents to indexing Web-scale collections of large documents and maximizing indexing throughput is often a common and important goal.
Benefits of using Elasticsearch:
1) Manages the huge amount of data: As a comparison to the traditional SQL database management systems that take more than 10 seconds to fetch required search query data, Elasticsearch can do that within 10 ms.
2) Direct, Easy and Fast access: Documents are stored in a close proximity to the corresponding metadata in the index. This reduces the no of data reads and as a result increases the search result response.
3) Scalability of the search Engine: As Elasticsearch has a distributed architecture it enables to scale up to thousands of servers and accommodate petabytes of data. The customers then need not manage the complexity of distributed design as it has been done automatically.
Sometimes we have more than one way to index some documents or query them and with the help of Elasticsearch, we can do it better. Elasticsearch is not new but its evolving rapidly, new features are getting added. But the core is consistent and can help achieve faster performance with search results for your search engine.