elastic search in python, 02 Dec 2016

Elastic Search is one of the great backbone for searching application. In this post, we are going to explore how to index database with elasticsearch (or its new name elastic using elasticsearch-py. I’m not writing this blog post myself but it’s based on this post

First, we have to install elastic from download page. Or if you have Homebrew, you can install it using brew install elasticsearch.

Then run elasticsearch, i.e. ./bin/elasticsearch if you download the folder or elasticsearch for Homebrew.

Here is an example on how to add index for example titanic dataset in Python.

import pandas as pd
from urllib.request import urlopen
from elasticsearch import Elasticsearch

url = "http://apps.sloanahrens.com/qbox-blog-resources/kaggle-titanic-data/test.csv"
index_name = 'titanic'
type_name = 'passenger'
id_field = 'passengerid'

titanic_df = pd.read_csv(urlopen(url)).fillna('')
titanic_records = df.to_dict(orient='records')

es = Elasticsearch()
if not es.indices.exists(index_name):
    es.indices.create(index_name, ignore=400)

actions = []
for i, r in enumerate(titanic_records):
    actions.append({"_index": index_name,
                    "_type": type_name,
                    "_id": i,
                    "_source": r})

helpers.bulk(es, actions=actions)

Other useful snippets include

es.indices.get_aliases().keys() # get list of all index
es.search(index='titanic', q="C.A.", size=20)['hits']['hits'] # search C.A. from `titanic` index for 20 results
es.count(index_name) # count how many we've indexed in `titanic`
es.indices.delete(index=index_name, ignore=[400, 404]) # delete index