Skip to content

Instantly share code, notes, and snippets.

@gphat
Created December 4, 2011 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gphat/1430601 to your computer and use it in GitHub Desktop.
Save gphat/1430601 to your computer and use it in GitHub Desktop.
=pod
=head1 Taming Search with Data::SearchEngine
Sooner or later it's going to happen: Someone will request a feature of your
application's search code. It might be gentle at first. A casual remark about
speed, functionality or scalability will meander into your bug tracker,
standup meeting or planning session. At first you will nod and file it away,
knowing that it takes a few requests for something to really stick. Pretty
soon a second, perhaps unrelated, request will arrive. Before you know it you'll
be surrounded by reminders, almost <Tribble-like|http://en.wikipedia.org/wiki/Tribble>,
of a sobering fact:
C<SELECT * FROM table WHERE description LIKE "whatever%"> isn't going to cut
it anymore.
=head2 Investigating Your Options
There are B<lots> of ways to add search to your application. The details of
which largely depend on the type of data you are searching. You are on your
own for evaluating and testing search-engines. There are plenty of resources
for that task.
Instead, lets focus on what to do to minimize the impact to your application
when adopting or changing search engines.
=head2 The Problem with Search
Every search library has a different interface. Assuming you are using something
similar to L<MVC|http://en.wikipedia.org/wiki/Model–view–controller>, you'll
have code in each layer that deals with the implementation-specific functionality.
You controller will have to parse requests and build queries to send to the
model and the view will need to iterate over and display the results. The model
will bear the brunt of the changes, but that's what models are for.
Needing to rewrite our controller and view every time we adjust our search or
&mdash; worse yet &mdash; each time we evaluate a new search product is a real
pain. I bet we can fix this if we just add another layer of abstraction!
=head2 Enter Data::SearchEngine
Data::SearchEngine is a toolbox that comes with everything you need to wrap a
pretty API around your search implementation. It even has two wrappers already
written: one for L<Solr|http://lucene.apache.org/solr/> and one for
L<ElasticSearch|http://www.elasticsearch.org/>. Before we talk about those let's
take a moment to wrap up your average SQL-based search with these tools so you
can see how they work.
=head2 Step 1: Subclass!
First, you'll want to create a Data::SearchEngine::MySearch that wraps your
implementation:
package Data::SearchEngine::MySearch;
use Moose;
with 'Data::SearchEngine';
sub search {
my ($self, $query) = @_;
}
1;
We're consuming a L<Moose roles|https://metacpan.org/module/Moose::Cookbook::Roles::Recipe1>
called L<Data::SearchEngine|https://metacpan.org/module/Data::SearchEngine> that
requires the implementation of a method called C<search>. Let's imagine that
your search code just searches a databases using C<LIKE>. I'm sure you can
imagine a bit of code that executes that query and gets back a resultset, right?
Great! Let's move on to the next bit then.
B<Note:> That role also requires that you implement C<find_by_id>. You can
just make an empty sub to quiet it for now.
=head2 Step 2: Getting the Query
The query is the request that the user has given us to find something. This is
where the rubber really meats the road, as we need to create a query format
that any search engine can use. We won't try to abstract the syntax, but we
can provide a container:
L<Data::SearchEngine::Query|https://metacpan.org/module/Data::SearchEngine::Query>
gives us a simple Query object:
my $query = Data::SearchEngine::Query->new(
count => 10, # the number of results we'd like
page => 1, # the page we are on
query => 'elephants', # the query we're searching for
);
my $se = Data::SearchEngine::MySearch->new;
my $results = $se->search($query);
Easy, eh? Your search backend may need more information or have a more
complex query format, but that's ok. Data::SearchEngine::Query has a
permissive C<query> attribute plus hooks for things like filters.
=head2 Step 3: Results!
That last code example showed getting results back. How does that work? Let's
write it! Start with our C<MySearch> example earlier, but put some meat on
it's bones:
sub search {
my ($self, $query) = @_;
# ... your internal search junk, run some SQL maybe?
my $result = Data::SearchEngine::Results->new(
query => $query,
pager => Data::Paginator->new(
current_page => $query->page,
entries_per_page => $query->$query->count,
total_entries => # results from your query!
)
);
# Iterate over your resultset here.
foreach my $hit (@hits) {
$result->add(Data::SearchEngine::Item->new(
id => $hit->id, # The unique id for this item
# Put any data you want to use in your result listing into
# this values hash.
values => {
name => $hit->name,
description => $hit->description
},
score => $hit->score
));
}
# Return the result
return $result;
}
That bit of code is pretty simple. We run our query and then store each row
that is returned for that page in a L<Data::SearchEngine::Results> object.
Now that we have our results we can show them to the user.
B<Note:> Data::SearchEngine uses a special paginator class called
L<Data::Paginator> that has many of the features of L<Data::Page|https://metacpan.org/module/Data::Page>
and L<Data::Pageset|https://metacpan.org/module/Data::Pageset>. Since all of
Data::SearchEngine is serializable there needed to be an easily serializable,
Moose-based pagination module. Hence Data::Paginator!
=head2 Step 4: Show Our Answers
The aforementioned Results object has an attribute C<items>. This is an array
of L<Data::SearchEngine::Item|https://metacpan.org/module/Data::SearchEngine::Item>
objects. Displaying our results is as simple as iterating over this array.
We'll write this in Perl, but it's easy to translate into your favorite
templating module.
foreach my $item (@{ $result->items }) {
print $item->id.' '.$item->get_value('name')."\n";
}
That's it! You'll use C<get_value> to retrieve any fields other than C<id>
from the item.
=head2 Done, So Now What?
You've know successfully wrapped your internal search code with a powerful
abstraction. You could now easily experiment with
L<ElasticSearch|https://metacpan.org/module/Data::SearchEngine::ElasticSearch>
or L<Solr|https://metacpan.org/module/Data::SearchEngine::Solr>, the
two search products for which there are existing Data::SearchEngine backends.
Or you could take what you've just learned and create a new backend for a
different search product.
=head2 Some Other Noteworthy Features
The Query object has lots of convenience methods for filtering (limiting your
results via a filter such as "price > 20") and faceting (counting the number of
items with different attributes so you can filter them). It will also generate
a unique digest based on it's attributes so that you can cache results.
The Results object can be subclassed if your implementation needs some new
features. There are existing roles for L<Faceting|https://metacpan.org/module/Data::SearchEngine::Results::Faceted>
and L<Spellchecking|https://metacpan.org/module/Data::SearchEngine::Results::Spellcheck>.
Just have your C<search> method return the subclass.
Results, Query and Item objects are all serializable using
L<MooseX::Storage|https://metacpan.org/module/MooseX::Storage> via
L<Deferred|https://metacpan.org/module/MooseX::Storage::Deferred>. This is
provided to make caching easy.
Finally, keep in mind that Query is just a guide. Your implementation may
require much more complex syntax and Data::SearchEngine tries to stay out
of the way. For example the ElasticSearch query DSL uses hashrefs, not strings:
# A real example using ElasticSearch
my $query = Data::SearchEngine::Query->new(
count => 20,
page => 1,
type => 'query_string',
query => {
'query' => 'foobar',
},
order => { 'date_crated' => 'desc' }
);
=head2 Conclusion
You might not change search backends every week, but taking a bit of time to
wrap your custom implementation in something featureful can save you a lot of
trouble down the road. It also provides you with some great features as a
result!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment