Skip to content

Instantly share code, notes, and snippets.

@devops-school
Created April 8, 2024 07:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save devops-school/dd736c2642aa801eaf80458bd06b26e5 to your computer and use it in GitHub Desktop.
Save devops-school/dd736c2642aa801eaf80458bd06b26e5 to your computer and use it in GitHub Desktop.
Lucene Cheatsheet

Lucene Cheatsheet

Andrew Pennebaker

https://github.com/mcandre/cheatsheets/blob/master/lucene.md

About

Lucene is a programmable search engine, used by elasticsearch and Kibana to search public and private data collections.

Documentation

Apache Lucene

LuceneTutorial.com

Lucene Query Parser Syntax

Lucene in Action

Basic queries

Lucene indexes can be case-sensitive or case-insensitive, depending on configuration.

cats

CATS

CaTs

Unlike other search engines, Lucene defaults term-pairing to ORs rather than ANDs.

Union

cats dogs

cats OR dogs

Intersect

Most of the time, you will want to remember to explicitly AND terms together:

cats AND dogs

+cats +dogs

Nesting

(+cats +dogs) (+"peanut butter" +jelly)

Subtraction

Minus (-) excludes a term from results, and automatically ANDs it with the rest of the query.

cats -dogs

cats AND NOT dogs

Phrases

"grumpy cat"

Wildcards

Question mark (?) matches a single, arbitrary character.

Asterisk (*) matches any word or phrase.

Notes:

  • Wildcards and other special characters (e.g., +, -, &, |, !, (, ), {, }, [, ], ^, ", ~, *, ?, :, and \) need to be escaped (e.g., \?, \?) when used inside phrases/strings, or searched for as a literal.
  • An asterisk cannot be used as the first character of a term (e.g., *oogle is bad syntax).
cats

c?ts

+khtml +like +Gecko

+khtml +like +Geck?

"khtml like Geck\?"

+"khtml, like" +Ge*

"khtml, like \*"

error\:

Fuzzy searches

Lucene can search for similar terms:

integer~

will match on integer, integers, and intejer.

Specify a threshold

An optional fuzziness threshold can be specified, from 0.0 (very loose) to 1.0 (very strict).

integer~

integer~0.5

integer~0.4

integer~0.6

Operators

Host-specific search

Hosts tend to require fully qualified domain names (e.g., google is bad syntax, google.com is good syntax). Though wildcards can help abbreviate this.

host:tomcat.apache.org

host:tomcat*

Log file path

path:catalina*

Custom attributes

Each Lucene index may specify additional query operators. Common operators include message: and timestamp:.

Note: When a term is not prefixed with an operator, it is automatically searched for across all operators. For best results, it is often useful to not specify any operators for your search terms.

Alternatives

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment