Advanced Searching of Emails using ElasticSearch and Rails 5.2

Background

Gmail email search requires a custom syntax. It is not easy to generate query for custom requirements. The UI is also difficult to use since only very tiny amount is dedicated for mail search. We can answer complicated questions by writing custom queries using ElasticSearch. The alternatives such as Outlook, Thunderbird etc all try to do too much and they are not focused on making the email search powerful and easy.

Initial Setup

Install the ElasticSearch following the instructions for your OS. I used brew to install it on my Mac 10.13.6. We can cURL http://localhost:9200/ to see the ElasticSearch version and other details.

{
  "name" : "WTo6fJC",
  "cluster_name" : "elasticsearch_bparanj",
  "cluster_uuid" : "dgoGmqLEQ4y6j-Oico4BqQ",
  "version" : {
    "number" : "6.6.1",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "1fd8f69",
    "build_date" : "2019-02-13T17:10:04.160291Z",
    "build_snapshot" : false,
    "lucene_version" : "7.6.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Importing Mbox Files into ElasticSearch

Install pip.

sudo easy_install pip

You must have Python installed for this to work. Ruby does not have an easy way to import the Mbox files. Python has builtin Mbox parser. This is a good opportunity for someone to write a Mbox file parser in Ruby.

Next, download the entire project: https://github.com/oliver006/elasticsearch-gmail. Run the importer:

python src/index_emails.py --infile=emails.mbox

This will import emails in batches of 500 records at a time into ElasticSearch.

Query ElasticSearch

After the import has completed successfully, we can get all indexes in the ElasticSeach:

curl -X GET "localhost:9200/_cat/indices?v"

To get all the records:

curl -H 'Content-Type: application/json' 'http://localhost:9200/_search' -d'
{
    "query" : {
        "match_all" : {}
    }
}'

We can use elasticsearch gem to interact with the ElasticSearch server. We will not use any Rails ActiveRecord dependent ElasticSearch gems. Because we don't need the added complexity of these gems. They can also slow us down during upgrades. Add the gem to Gemfile.

gem 'elasticsearch'

Run bundle. We can now run:

client = Elasticsearch::Client.new log: true
results = client.search index: 'myindex', body: { query: { match: { subject: 'Dinner Reservation' } } }

We can see the keys of the results hash:

results.keys

We need the nested hash of hits:

results['hits'].keys

This in turn has a nested hits that is an array of hashes.

results['hits']['hits'].size

The default size in the array is 10. We can inspect the first record:

results['hits']['hits'][0]

We can also see the keys of the imported email record:

results['hits']['hits'][0].keys

Creating complex queries requires reading the documentation for the elasticsearch ruby source code repo in github.


Related Articles


Ace the Technical Interview

  • Easily find the gaps in your knowledge
  • Get customized lessons based on where you are
  • Take consistent action everyday
  • Builtin accountability to keep you on track
  • You will solve bigger problems over time
  • Get the job of your dreams

Take the 30 Day Coding Skills Challenge

Gain confidence to attend the interview

No spam ever. Unsubscribe anytime.