ElasticSearch: demystifying the bool query
ElasticSearch is one of the most popular and leading industry search engine according to the 2018 search engine ratings. Its ease of use and abundant in features make it very useful. The compound queries are one of the most used features in ElasticSearch and amongst them, the bool query is where the ElasticSearch truly stands out.
According to Elastic:
A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence.
In the boolean query, there are four kinds of occurrences: must, should, must_not and filter.
The must clause (query) must appear in matching documents, and its functionality mimics the boolean “AND”.The highest priority for the clause is to score the documents. For instance, you are searching for “Audi A6" model car and do not want to see other cars in the search result.
The boolean expression will be:
make = Audi AND model = A6
The bool DSL query:
On the other hand, for must_not, the query must not appear in the matching documents. The boolean format is:
make != Toyota
The DSL query:
The functionality of the should occurrence type differs from the other queries, and it somewhat corresponds to the boolean “OR”. In a query context, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document. Let’s look at an example, assume you want to search for either “Audi” or “Kia”, if one of the criteria matches, it should return the result.
make = Audi OR make = Kia
What about the filter clause? It is quite similar to the must, if a filter clause is used, then the query must also appear in the matching documents, but does not contribute to the score. For example, let’s say, you want to filter cars which has an ANCAP safety rating of 5, the query will appear as follows:
What about a real world scenario? In the real world, simply matching one or two fields is not enough. The users might want to search for cars with multiple makes and models in one query. How do we cope with that? Luckily, ElasticSearch supports this. For example, if you want to search used “Audi”, “Kia” and “ Toyota” for models “A6”, “Camry” and “Optima”.
The boolean format will be:
(make = Audi OR make = Kia OR make = Toyota) AND (model = A6 OR model = Camry OR model = Optima) AND condition = used
Note that the whole query is wrapped in a must clause which satisfies all three AND clauses, and each individual piece is its own nested bool query. As all the pieces are individually wrapped in an AND clause, each piece in the chain is connected with an AND logic.
In general, my overall experience with ElasticSearch’s bool query has been positive and delightful. This is a great tool for combining complex queries, filters, ranges, and sorting together as it provides incredible flexibility. Furthermore, ElasticSearch is able to run all the complex queries together in real time and locate the most suitable results and return it to the user in a very short amount of time.