Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save haslinger/fe3b8f15432e08daf565 to your computer and use it in GitHub Desktop.
Save haslinger/fe3b8f15432e08daf565 to your computer and use it in GitHub Desktop.
Integration of Elastic Search into Hobo

I have got a Webshop applacation with a structure of 1700 products in 400 product categories through categorizations, and 1600 product properties in 1100 property groups and their 105 000 values.

For filtering and wanted to have a facetted search on category level on the values, like in the screenshot in the file facets.png

In the /Gemfile I load the Searchkick Gem:

gem "searchkick"                                    # Elastic Search integration

Indexing Product Value Data

In the Product model /app/modals/product.rb I enable searchkick (so it's done on model level):

class Product < ActiveRecord::Base
  # ...
  searchkick language: "German"
  # ...

  # --- Searchkick Instance Methods --- #

  def search_data
    {
      title: title_de,
      number: number,
      description: description_de,
      long_description: long_description_de,
      warranty: warranty_de,
      category_ids: categories.pluck(:id),
      state: state
    }.merge(property_hash)
  end

  def property_hash
    Hash[self.values
             .includes(:property)
             .map { |value| [value.property.name_de,
                             value.display.rstrip]}]
  end
  #...
end

The search_data declares the fields for the elastic search to be indexed. Currently searchkick does not support nested structures here. But they have just exchanged the Tire Gem for the Elasticsearch api Gem, and peaple are already asking for it. Give them some time. ;-)

The property_hash helper method collects key: value pairs aka hash entries from my data structure so here is a heavy need for you to adapt to your structure. This collection step and the counting of the matches accross a category is the speed benefit we get from Elasticsearch.

Searchkick is so clever to update the indexed data in a callback on any update/create that happens to the instance. So once you have indexed all intsalces using

Product.reindex

there is no further need for manual steps or setting up automatic ones.

Indexing the Category Properties and Category Property Groups

This is maybe a bad design decision and I'm looking for better ways, but the following procedure is reliable and reasonably fast. I store all properties and property groups on a category instance level.

For this I created a serialized field in the category model.

class Category < ActiveRecord::Base
#...

  fields do
    #...
    filters             :serialized
    #...
  end

  #...
  def self.update_property_hash
    Category.order(id: :asc).load.each do |category|
      category.filters = category.property_groups_hash
      category.save
    end
  end

  #...

  def property_groups_hash
    product_ids = self.products.includes(:values).active.*.id
    values = Value.where(product_id: product_ids)
                  .includes(:property_group)
                  .includes(:property)
    property_pairs = values.map {|value| [value.property_group.name_de,
                                          value.property.name_de] }
                           .uniq
    property_groups = property_pairs.group_by { |pair| pair[0] }
    property_groups.each {|key,value| property_groups[key] = value.map{|pair| pair[1]} }
    return property_groups
  end
  #...
end

For this column to be update I created a small class method update_property_hash that iterates over all categories und updates the property groups hash that we will use later as filters in the view, but be patient.

Then instance method property_groups_hash collects ther available properties for all active products. For this it first reads all property group names and property names for all values set. Then it throws away duplicates. Then it groups the property group name / property name pais by property group name and finally throws away the group name.

An example: [G1, P1], [G1, P2], [G2, P3], [G2, P4] becomes { G1 -> [P1, P2], G2 -> [P3, P4]}

I thought about creating callbacks, but to be 100% accurate I had to create ad update on each category when one of it's product's properties changes. I decided, that a daily job in enough, as the the properties assigned to a product don't change that often. Preloading property groups and porperties when loading the values gives a speed boost for the update job.

Aquiring search resucts in the controller

The full controller looks like this:

class CategoriesController < ApplicationController

  hobo_model_controller
  auto_actions :show

  def show
    hobo_show do
      @filter = params[:filter]
      @filter ||= {}

      matcharray = []
      @filter.each do |key, value|
        matcharray << { match: { URI.unescape(key) => URI.unescape(value) } }
      end

      @facets = Product.search(query: {
                                 bool: {
                                   must: [ { match: { category_ids: params[:id].to_i } },
                                           { match: { state: 'active' } }
                                         ] + matcharray
                                 }
                               },
                               facets: this.filters.values.flatten)

      @products = Product.search(query: {
                                   bool: {
                                     must: [ { match: { category_ids: params[:id].to_i } },
                                             { match: { state: 'active' } }
                                           ] + matcharray
                                   }
                                 }).results
    end
  end
end

I use the elastic search dsl here for and-concatenation of the conditions. For this I need a so-called bool query and I contcatenate using must.

The @facets instance variable contains a nested hash of all values of all properties of the active products of a category restricted to the ones by the filters selected.

The @products instance variable contains just the products of a category restricted to the ones by the filters selected.

The @filter provided stortes the filters params hash from the last call, if any, in an URL-unescaped format.

Displaying filters

Now that all data has been collected, we display the filter 'widget'.

I created a tag in /app/views/taglibs/application/category.dryml:

<def tag="filter-nav">
  <set category="&this"/>
  <%= this.filters.to_a.in_groups_of(3) do |group| %>
    <div class="row-fluid">
      <div repeat="&group" class="span4">
        <if>
          <strong><view with="&this[0]"/></strong><br/>
          <repeat with="&this[1]">
            <set myfacet="&this"/>
            <if test="&@facets.facets[this]['terms']">
              <view/>:
              <repeat with="&@facets.facets[this]['terms']">
                <set term="&this['term']"/>
                <if test="&@filter[u(myfacet)] == u(term)">
                  <strong>
                    <view with="&term"/>
                  </strong>
                  <a with="&category"
                     params="&{:filter => @filter.except(u(myfacet))}">
                    <%= fa_icon "trash-o" %>
                  </a>
                </if>
                <else>
                  <a with="&category"
                     params="&{:filter => @filter.merge({u(myfacet) => u(term)})}">
                    <view with="&term"/>
                  </a>
                </else>
                (<view with="&this['count']"/>)
              </repeat>
              <br/>
            </if>
          </repeat>
        </if>
        <br/>
      </div>
    </div>
  <% end %>
</def>

It is pretty straight forward, is uses three bootstrap columns. It loops over the property groups printing their name. Within these we loop over all properties printing their respective names.

If a value was in the filter param i.e. it is now in the @filter instance variable, it was selected in the last run. That means that it is already filtered out. So we don't display it is a link, instead we print it bold. Furthermore we print a small font awesome basket which provides a link to the category with all filters from the last run except the current one.

Otherwise we print the value as a link to the category itself containing all old filters plus the one merged here.

In any case the number of results to be expected when using the filter by clicking on the link is displayed in brackets..

As we are using get requests we are restricted to some 4096 characters for the url. I thought about storing the filters in the session, but I decided against, becauso thi way, links to filtered categories can be shared betrween sales people and customers, and I don't run into problems, when several categories are browsed within one session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment