gane5h/ES-syllabus.md

## ES-syllabus.md

      
    Raw
  

              ES-syllabus.md
            
          
    Getting started with Elasticsearch

Abstract

This tutorial is an Elasticsearch bootcamp. Elasticsearch is a fully-distributed and scalable search server based on Apache Lucene. Companies like foursquare, soundcloud, github and hundreds more use it to power search and analytics in their applications.
At the end of the day, you’ll:

know the most important concepts and terminology of search engines
have a deep understanding of Elasticsearch
apply Elasticsearch to build search applications
analyze and resolve common problems with Elasticsearch

No prior experience with search or Elasticsearch is required. This tutorial is specially useful for folks using Elasticsearch for logging and want to learn how to use some of the more advanced features.
Syllabus


Overview of full-text search (1 hour)

why another datastore?
theory: information retrieval

vector space model
inverted indices
index construction
computing scores
evaluation: precision and recall


Getting started with ES (30 mins)

differences between Lucene / Solr / Elasticsearch
downloading and installing
distributed features: sharding, replication, fault tolerance
architecture: indices, types, routing, nodes


Search (1 hour)

mappings and datatypes
configuring analyzers, tokenizers
query DSL and API overview
search types: term, prefix, fuzzy, etc.
sorting, facets, filters, highlighting
advanced: geo-bound search, more-like-this


Other features (20 mins)

percolation, scripting, parent-child documents, rivers


Production (30 mins)

data-flow: pulling data from MySQL for indexing
security & audit
performance tuning
cluster API for health, node state, etc.
monitoring, alerting, backups, etc.