Open Source Search Comparison
- Main Web: http://www.elasticsearch.org/
- Development URL: https://github.com/elasticsearch/elasticsearch
- License: Apache 2
- Environment: Java
Elasticsearch was created in 2010 by Shay Banon after forgoing work on another search solution, Compass, also built on Lucene and created in 2004.
- Real time data, analytics
- Distributed, scaled horizontally. Add nodes for capacity.
- High availability, reorganizing clusters of nodes.
- Multi-tenancy. Multiple indices in a cluster, added on the fly.
- Full text search via Lucene. Most powerful full text search capabilities in any open source product
- Document oriented. Store structured JSON docs.
- Conflict management
- Schema free with the ability to assign specific knowledge at a later time
- Restful API
- Document changes are recorded in transaction logs in multiple nodes.
- Built on Lucene
- Data is stored with
POSTrequests and retrieved with
GETrequests. Can check for existence of a document with
HEADrequests. JSON documents can be deleted with
- Requests can be made with JSON query language rather than a query string.
- Indexed documents are versioned. (Uunique feature?)
- Full text docs are stored in memory. A new option in 1.0 allows for doc values which are stored on disk.a
- Suggesters are built in to suggest corrections or completions.
- Plugin system available for custom functionality.
- Possible admin interface via Elastic-HQ
- Elasticsearch in Production is a great article on some of the realities faced when running Elasticsearch.
- Securing your Elasticsearch cluster
- Plugins available for authentication.
- Why We Built Elasticsearch - dotScale presentation from the creator, Shay Banon
- GitHub's transition from Solr to Elasticsearch
we quickly exceeded the volume, just literally the storage space that one Solr cluster and Solr instance could handle.
- Many great Elasticsearch articles by Greg Brown.
- Main Web: http://sphinxsearch.com/
- Development URL: http://sphinxsearch.com/bugs/my_view_page.php
- License: GPLv2
- Environment: C++
Sphinx was created in 2001 by Andrew Aksyonoff to solve a personal need for search solution and has remained a standalone project.
- Supports on the fly (real time) and offline batch index creation.
- Arbitrary attributes can be stored in the index.
- Can index SQL DBs
- Can batch index XMLpipe2 and (?) tsvpipe documents
- 3 different APIs, native libraries provided for SphinxAPI
- DB like querying features.
- Real time indexes can only be populated using SphinxQL
- Disk based indexes can be built from SQL DBs, TSV, or custom XML format.
- Example PHP API file to be included in projects communicating with Sphinx.
fsockopenin PHP to make a connection with the Sphinx service similar to how a MySQL connection would be made.
- Various Sphinx articles
- Main Web: http://lucene.apache.org/solr/
- Development URL: https://issues.apache.org/jira/browse/SOLR
- License: Apache 2
- Environment: Java
Solr was created in 2004 at CNet by Yonik Seeley and granted to the Apache Software Foundation in 2006 to become part of the Lucene project.
- Rest-like API
- Documents added via XML, JSON, CSV, or binary over HTTP.
- Query with
GETand receive XML, JSON, CSV, or binary results.
- XML configuration
- Extensible plugin architecture
- AJAX based admin interface
- Introduction to Information Retrieval
- A detailed series on Solr vs Elasticsearch
- Comparison between Solr and Sphinx
- Comparison of full text search engines
- Choosing a stand-alone full-text search server: Sphinx or SOLR
Misc Thoughts and Opinions
These thoughts and opinions were mostly formed during the creation of this document while researching various search solutions.
- Elasticsearch provides a RESTful API endpoint for all requests from all languages. Sphinx provides language specific wrappers for the API to communicate with the service.
- It seems more straightforward to push arbitrary documents and schema via JSON at Elasticsearch than to create fields as Sphinx requires. I'm not entirely sure on this point yet.
- Sphinx is definitely designed around a SQL type structure, though it has been modified over time to support other data stores. I think this could be an issue.
- That Elasticsearch is developed on GitHub is a big positive for me. The combined interfaces of MantisBT and Google's code repository is a little annoying.
- Decisions like implementing xmlpipe2 and tsvpipe by Sphinx as data sources are somewhat confusing. I think the standard formats offered with Solr and Elasticsearch make more sense.
- Elasticsearch was built to be real time from the beginning. Solr is near real-time. Sphinx started as a batch indexer and moved (rightly) to real time over time. See Sphinx real time caveats.
- I'm a fan of this:
one can launch ElasticSearch and start sending documents to it in order to have them indexed without creating any sort of index schema and ElasticSearch will try to guess field types.