Elasticsearch Tutorial: What is percolation?

Home/Elasticsearch/Elasticsearch Tutorial: What is percolation?

One of the lesser known features in Elasticsearch search is percolation. It’s commonly referred to as “search in reverse” and it allows you to index queries and, at a later time, percolate a document to find queries that it will match. Sound a bit confusing? It took me a while to wrap my head around this feature, but once you understand how it works it’s easy to see just how powerful it can be across a wide range of domains.

For example, imagine:

  • A news aggregator that, instead of restricting user subscriptions to broad categories of content, it allowed them the freedom to be very specific about the type of content they’re interested in. For example, instead of simply following a ‘Sports’ category they subscribe to articles discussing the Cleveland Cavs during the weeks of the playoffs.
  • A regulation company needing to be notified when specific types of additives are added to our food.
  • A day trader who wants know when a security within the Tech sector break through their 10-day moving average.

To familiarize ourselves with the Percolator, we’re going to walk through a simple example that includes a ‘Library’ index which contains documents of type ‘Book’. The book type mapping has properties describing author, title, categories, and a list of nested locations (which will include availability information).

NOTE: The examples throughout this post were created using the Sense tool within the Marvel plugin. The query feature itself makes it worth the 15 seconds it takes to install. But if you use other tools, such as cURL, be sure to add the server and port to the url.

Ex.

CURL -XGET http://localhost:9200/library/book/1

POST library/_mapping/book
{  
  "book":{  
    "properties":{  
      "author":{  
        "type":"string"
      },
      "categories":{  
        "type":"string",
        "index":"not_analyzed"
      },
      "id":{  
        "type":"integer"
      },
      "locations":{  
        "type":"nested",
        "properties":{  
          "availableCopies":{  
            "type":"integer"
          },
          "name":{  
            "type":"string"
          }
        }
      },
      "title":{  
        "type":"string"
      }
    }
  }
}

Given the above mapping, let’s consider a patron that wants to be notified when any books new books by Vonnegut are added. To do this, we’re going to register a query using the library index’s percolator endpoint.

Note that in the url, we’re specify the new query id, “johns_wishlist”. The message body itself has two parts: the query which includes the match criteria, and a place for storing metadata describing the query. In this example, we’re using the metadata section to store the userid for the person owning this wish list query.

POST /library/.percolator/johns_wishlist
{  
  "query":{  
    "match":{  
      "author":"vonnegut"
    },
    "userid":"1"
  }
}

It’s awesome to see that, much like documents, queries are also stored in a flexible JSON format.

Now that we have a stored query, let’s percolate a document to see if we can match the query. This involves sending a message with document information to the “_percolate” endpoint. The “doc” property includes the values describing the book.

GET library/book/_percolate
{  
  "doc":{  
    "title":"Breakfast of Champions",
    "author":"Kurt Vonnegut"
  }
}

This request would produce the results:

{  
  "took":1,
  "_shards":{  
    "total":5,
    "successful":5,
    "failed":0
  },
  "total":1,
  "matches":[  
    {  
      "_index":"library",
      "_id":"johns_wishlist"
    }
  ]
}

The act of percolating does not add the document itself to the index, that would need to be done as a separate step. We could, however, percolate using existing documents by including the document id in the percolate request.

GET library/book/1/_percolate

The Percolator has several other useful features such as aggregation, sorting, highlighting, and percolating multiple documents at a time. Below are some resources available if you’d like to learn a bit more.

Official documentation – Overview of the available configuration options for the percolator.

Elasticsearch: The Definitive Guide – Most of the content covered in this book is available on the documentation site, but the book is better structured for folks (like myself) trying to learn Elasticsearch from start to finish.

About the Author:

Freelance software developer in the Cleveland area.

One Comment

  1. Lev April 15, 2015 at 8:14 pm - Reply

    Great article John. Very well written and easy to follow your example.
    Look forward to seeing more.

Leave A Comment