Skip to content

Instantly share code, notes, and snippets.

@clintongormley
Created February 8, 2011 15:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save clintongormley/816627 to your computer and use it in GitHub Desktop.
Save clintongormley/816627 to your computer and use it in GitHub Desktop.
ElasticSearch::QueryParser
NAME
ElasticSearch::QueryParser - Check or filter query strings
DESCRIPTION
Passing an illegal query string to ElasticSearch, the request will fail.
When using a query string from an external source, eg the keywords field
from a web search form, it is important to filter it to avoid these
failures.
You may also want to allow or disallow certain query string features, eg
the ability to search on a particular field.
The ElasticSearch::QueryParser takes care of this for you.
See <http://lucene.apache.org/java/3_0_3/queryparsersyntax.html> for
more information about the Lucene Query String syntax, and
<http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/quer
y_string_query/#Syntax_Extension> for custom ElasticSearch extensions to
the query string syntax.
SYNOPSIS
use ElasticSearch;
my $es = ElasticSearch->new(servers=>'127.0.0.1:9200');
my $qp = $es->query_parser(%opts);
my $filtered_query_string = $qp->filter($unchecked_query_string)
my $results = $es->search( query=> {
query_string=>{ query => $filtered_query_string }
});
For example:
my $qs = 'foo NOT AND -bar - baz * foo* secret_field:SIKRIT "quote';
print $qp->filter($qs);
# foo AND -bar baz foo* "quote"
METHODS
"new()"
my $qp = ElasticSearch::QueryParser->new(%opts);
my $qp = $es->query_parser(%opts);
Creates a new ElasticSearch::QueryParser object, and sets the passed in
options (see "OPTIONS").
"filter()"
$filtered_query_string = $qp->filter($unchecked_query_string, %opts)
Checks a passed in query string and returns a filtered version which is
suitable to pass to ElasticSearch.
Note: "filter()" can still return an empty string, which is not
considered a valid query string, so you should still check for that
before passing to ElasticSearch.
If any %opts are passed in to "filter()", these are added to the default
%opts as set by "new()", and apply only for the current run.
"filter()" does not promise to parse the query string in exactly the
same way as Lucene, just to clear it up so that it won't throw an error
when passed to ElasticSearch.
"check()"
$filtered_query_string = $qp->check($unchecked_query_string, %opts)
Checks a passed in query string and throws an error if it is not valid.
This is useful for debugging your own query strings.
If any %opts are passed in to "check()", these are added to the default
%opts as set by "new()", and apply only for the current run.
OPTIONS
You can set various options to control how your query strings are
filtered.
The defaults (if no options are passed in) are:
escape_reserved => 0
fields => 0
boost => 1
allow_bool => 1
allow_boost => 1
allow_fuzzy => 1
allow_slop => 1
allow_ranges => 0
wildcard_prefix => 1
Any options passed in to "new()" are merged with these defaults. These
options apply for the life of the QueryParser instance.
Any options passed in to "filter()" or "check()" are merged with the
options set in "new()" and apply only for the current run.
For instance:
$qp = ElasticSearch::QueryParser->new(allow_fuzzy => 0);
$qs = "foo~0.5 bar^2 foo:baz";
print $qp->filter($qs, allow_fuzzy => 1, allow_boost => 0);
# foo~0.5 bar baz
print $qp->filter($qs, fields => 1 );
# foo bar^2 foo:baz
escape_reserved
Reserved characters must be escaped to be used in the query string. By
default, "filter()" will remove these characters. Set "escape_reserved"
to true if you want them to be escaped instead.
Reserved characters: " + - && || ! ( ) { } [ ] ^ " ~ * ? : \"
fields
Normally, you don't want to allow your users to specify which fields to
search. By default, "filter()" removes any field prefixes, eg:
$qp->filter('foo:bar secret_field:SIKRIT')
# bar SIKRIT
You can set "fields" to 1 to allow all fields, or pass in a hashref with
a list of approved fieldnames, eg:
$qp->filter('foo:bar secret_field:SIKRIT', fields => 1);
# foo:bar secret_field:SIKRIT
$qp->filter('foo:bar secret_field:SIKRIT', fields => {foo => 1});
# foo:bar SIKRIT
ElasticSearch extends the standard Lucene syntax to include:
_exists_:fieldname
and
_missing_:fieldname
The "fields" option applies to these fieldnames as well.
allow_bool
Query strings can use boolean operators like:
foo AND bar NOT baz OR ! (foo && bar)
By default, boolean operators are allowed. Set "allow_bool" to "false"
to disable them.
Note: This doesn't affect the "+" or "-" operators, which are always
allowed. eg:
+apple -crab
allow_boost
Boost allows you to give a more importance to a particular word, group
of words or phrase, eg:
foo^2 (bar baz)^3 "this exact phrase"^5
By default, boost is enabled. Setting "allow_boost" to "false" would
convert the above example to:
foo (bar baz) "this exact phrase"
allow_fuzzy
Lucene supports fuzzy searches based on the Levenshtein Distance, eg:
supercalifragilisticexpialidocious~0.5
To disable these, set "allow_fuzzy" to false.
allow_slop
While a "phrase search" (eg "this exact phrase") looks for the exact
phrase, in the same order, you can use phrase slop to find all the words
in the phrase, in any order, within a certain number of words, eg:
For the phrase: "The quick brown fox jumped over the lazy dog."
Query string: Matches:
"quick brown" Yes
"brown quick" No
"quick fox" No
"brown quick"~2 Yes # within 2 words of each other
"fox dog"~6 Yes # within 6 words of each other
To disable this "phrase slop", set "allow_slop" to "false"
allow_ranges
Lucene can accept ranges, eg:
date:[2001 TO 2010] name:[alan TO john]
To enable these, set "allow_ranges" to "true".
wildcard_prefix
Lucene can accept wildcard searches such as:
jo*n smith?
Lucene takes these wildcards and expands the search to include all
matching terms, eg "jo*n" could be expanded to "jon", "john", "jonathan"
etc
This can result in a huge number of terms, so it is advisable to require
that the first $min characters of the word are not wildcards.
By default, the "wildcard_prefix" requires that at least the first
character is not a wildcard, ie "*" is not acceptable, but "s*" is.
You can change the minimum length of the non-wildcard prefix by setting
"wildcard_prefix", eg:
$qp->filter("foo* foobar*", wildcard_prefix=>4)
# "foo foobar*"
BUGS
This is a new module, so it is likely that there will be bugs, and the
list of options and how ""filter()" cleans up the query string may well
change."
If you have any suggestions for improvements, or find any bugs, please
report them to
<http://github.com/clintongormley/ElasticSearch.pm/issues>.
Patches welcome!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment