Skip to content

Instantly share code, notes, and snippets.

@gane5h
Last active August 29, 2015 13:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gane5h/10610797 to your computer and use it in GitHub Desktop.
Save gane5h/10610797 to your computer and use it in GitHub Desktop.
Polyglot Elasticsearch Workshop Syllabus

Getting started with Elasticsearch

Abstract

This tutorial is an Elasticsearch bootcamp. Elasticsearch is a fully-distributed and scalable search server based on Apache Lucene. Companies like foursquare, soundcloud, github and hundreds more use it to power search and analytics in their applications.

At the end of the day, you’ll:

  1. know the most important concepts and terminology of search engines
  2. have a deep understanding of Elasticsearch
  3. apply Elasticsearch to build search applications
  4. analyze and resolve common problems with Elasticsearch

No prior experience with search or Elasticsearch is required. This tutorial is specially useful for folks using Elasticsearch for logging and want to learn how to use some of the more advanced features.

Syllabus

  1. Overview of full-text search (1 hour)

    • why another datastore?
    • theory: information retrieval
      • vector space model
      • inverted indices
      • index construction
      • computing scores
      • evaluation: precision and recall
  2. Getting started with ES (30 mins)

    • differences between Lucene / Solr / Elasticsearch
    • downloading and installing
    • distributed features: sharding, replication, fault tolerance
    • architecture: indices, types, routing, nodes
  3. Search (1 hour)

    • mappings and datatypes
    • configuring analyzers, tokenizers
    • query DSL and API overview
    • search types: term, prefix, fuzzy, etc.
    • sorting, facets, filters, highlighting
    • advanced: geo-bound search, more-like-this
  4. Other features (20 mins)

    • percolation, scripting, parent-child documents, rivers
  5. Production (30 mins)

    • data-flow: pulling data from MySQL for indexing
    • security & audit
    • performance tuning
    • cluster API for health, node state, etc.
    • monitoring, alerting, backups, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment