Skip to content

Instantly share code, notes, and snippets.

@dkinzer
Last active March 22, 2024 17:35
Show Gist options
  • Save dkinzer/4f6dbb4634dbbdc99255dbea6305ccae to your computer and use it in GitHub Desktop.
Save dkinzer/4f6dbb4634dbbdc99255dbea6305ccae to your computer and use it in GitHub Desktop.

Blacklight Search (deep dive):

I want to derive a mental picture of a blacklight search. I know that I submit a form and that a query is made against a solr db, the result of which is processed and returnd to me. However, I need a more detailed picture of the interim steps.

Specically I want to get at:

  1. What is the function that first receives the forms input?
  2. Where and how can I make changes to that process?
  3. How does blacklight_advanced_search fit in?
  4. What ultimitely calls the solr db?
  5. What function processes the solr response?
  6. (How)/is it possible to add custom preprocessing to the returned search input?

tl;dr

(For those who would rather just get to the point):

  • What is the function that first receives the forms input?

    • The entry point is CatalogController#index
    • But the interesting things start at Blacklight::SearchHelper.search_results.
  • Where and how can I make changes to that process?

    • There are many opportunities to change how the search behaves. The easiest and most intuative is to update the default_processor_chain with the methods that you want to apply to search params before they are sent to the solr server.
  • How does blacklight_advanced_search fit in?

    • blacklight_advanced_search uses mixins and updates the default_processor_chain in order to change the behavior of the search.
  • What ultimitely calls the solr db?

    • RSolr.connect(connection_config.merge(adapter: connection_config[:http_adapter]))
  • What function processes the solr response?

    • solr_response = blacklight_config.response_model.new(res, solr_params, document_model: blacklight_config.document_model, blacklight_config: blacklight_config)
  • (How)/is it possible to add custom preprocessing to the returned search input?

    • Yes, and both the response_model and document_model are configurable, so one idea could be to wrap one of the default objects and enhance them with whatever functionality it is we are seeking to add.

Begin dive

The first step in our journey starts with the search form:

<form class="search-query-form clearfix navbar-form" role="search" action="http://localhost:32823/" accept-charset="UTF-8" method="get" _lpchecked="1"><input name="utf8" type="hidden" value="">
<!-- .. -->
</form>

As we can see the action path is set to the root of the application ("/") and the method is set to get.

We look inside of config/routes.rb to see how a get action to "/" is handled by the application.

Inside that file we see the following line

  root to: "catalog#index"

What this means is that calls to the root of our application get routed to the CatalogController.index action. So we need to determine how the catalog_controller is defined and where the index action comes from.

We see from ./app/controllers/catalog_controller.rb the definition for CatalogController:

class CatalogController < ApplicationController
  include BlacklightAdvancedSearch::Controller

  include BlacklightRangeLimit::ControllerOverride

  include Blacklight::Catalog
  # ...

And with a little slouthing, we determine that module Blacklight::Catalog.index gets defined in

blacklight-6.12.0/app/controllers/concerns/blacklight/catalog.rb. The definition reveals our entry point:

    def index
      (@response, @document_list) = search_results(params)

      respond_to do |format|
        format.html { store_preferred_view }
        format.rss  { render :layout => false }
        format.atom { render :layout => false }
        format.json do
          @presenter = Blacklight::JsonPresenter.new(@response,
                                                     @document_list,
                                                     facets_from_request,
                                                     blacklight_config)
        end
        additional_response_formats(format)
        document_export_formats(format)
      end
    end

Fortunately this is a relatively straight forward function at this level. We can see that the @response, and @document_list instance variables are assigned from the return value of search_results(params). Note that params is a hash of the url parameters as provided to the controller by rails.

It's important to note also that because @response and @document_list are being defined as instance variables that they are thus made available to any view or helper method that the CatalogController uses.

In the next step the controller (via the index actions) dynamically sets the response based on the mime type of the request via the respond_to dsl: https://apidock.com/rails/ActionController/MimeResponds/respond_to

Following search_results to its definition we find it at blacklight-6.12.0/app/controllers/concerns/blacklight/search_helper.rb:

module Blacklight::SearchHelper
  extend ActiveSupport::Concern
  include Blacklight::RequestBuilders

  # a solr query method
  # @param [Hash] user_params ({}) the user provided parameters (e.g. query, facets, sort, etc)
  # @yield [search_builder] optional block yields configured SearchBuilder, caller can modify or create new SearchBuilder to be used. Block should return SearchBuilder to be used.
  # @return [Blacklight::Solr::Response] the solr response object
  def search_results(user_params)
    builder = search_builder.with(user_params)
    builder.page = user_params[:page] if user_params[:page]
    builder.rows = (user_params[:per_page] || user_params[:rows]) if user_params[:per_page] || user_params[:rows]

    builder = yield(builder) if block_given?
    response = repository.search(builder)

    if response.grouped? && grouped_key_for_results
      [response.group(grouped_key_for_results), []]
    elsif response.grouped? && response.grouped.length == 1
      [response.grouped.first, []]
    else
      [response, response.documents]
    end
  end

This is where things start to get interesting. We can see for instance that we will be using a builder to generate a response. This builder seems both configurable and overridable as it is either generated using the user_params builder = search_builder.with(user_params) or overridden using the builder itself if a block is given builder = yield(builder) if block_given?.

[Quick asside] : As an example of how the builder override is invoked we can take a look at the Blacklight::Catalog.index override defined in blacklight_advanced_search gem: blacklight_advanced_search-6.3.1/app/controllers/blacklight_advanced_search/advanced_controller.rb

Where you can see search_results is called using a block.

class BlacklightAdvancedSearch::AdvancedController < CatalogController
  def index
    @response = get_advanced_search_facets unless request.method == :post
  end

  protected

  # Override to use the engine routes
  def search_action_url(options = {})
    blacklight_advanced_search_engine.url_for(options.merge(action: 'index'))
  end

  def get_advanced_search_facets
    # We want to find the facets available for the current search, but:
    # * IGNORING current query (add in facets_for_advanced_search_form filter)
    # * IGNORING current advanced search facets (remove add_advanced_search_to_solr filter)
    response, _ = search_results(params) do |search_builder|
      search_builder.except(:add_advanced_search_to_solr).append(:facets_for_advanced_search_form)
    end

    response
  end
end

Going back to the search_results function, we note that the main pattern is the creation of a builder object (also referred to as a query) and passing the builder/query object to the repository.search method:

  def search_results(user_params)
    # ... 
    builder = search_builder.with(user_params)
    response = repository.search(builder)
    # ... 
  end

  # ...

  def get_facet_field_response(facet_field, user_params = params || {}, extra_controller_params = {})
    # ... 
    query = search_builder.with(user_params).facet(facet_field)
    repository.search(query.merge(extra_controller_params))
    #... 
  end

  # ...

  def get_previous_and_next_documents_for_search(index, request_params, extra_controller_params={})
    #... 
    query = search_builder.with(request_params).start(p.delete(:start)).rows(p.delete(:rows)).merge(extra_controller_params).merge(p)
    response = repository.search(query)
    #... 
  end

  # ... 

  def get_opensearch_response(field = nil, request_params = params || {}, extra_controller_params = {})
    #... 
    query = search_builder.with(request_params).merge(solr_opensearch_params(field)).merge(extra_controller_params)
    response = repository.search(query)
    #... 
  end

Given this general pattern, the next obvious thing to take a look at are the search_builder and repository definitions.:

In the current file we find a definition for the repository:

  delegate :repository_class, to: :blacklight_config

  def repository
    repository_class.new(blacklight_config)
  end

And in blacklight-6.12.0/app/controllers/concerns/blacklight/request_builders.rb we find a definition for the request_builder:

module Blacklight
  module RequestBuilders
    extend ActiveSupport::Concern
    #...

    # Override this method to use a search builder other than the one in the config
    delegate :search_builder_class, to: :blacklight_config

    def search_builder
      search_builder_class.new(self)
    end
    #...
  end
end

In both of these definitions we note that blacklight_config is invoked (meaning these objects are configurable). So our next question is to take a look at where and how blacklight_config is defined. To answer that question, let's go back and take a look at the definition for Blacklight::Catalog found at blacklight-6.12.0/app/controllers/concerns/blacklight/catalog.rb

module Blacklight::Catalog
  extend ActiveSupport::Concern

  include Blacklight::Base
  include Blacklight::DefaultComponentConfiguration
  include Blacklight::Facet

  #...

One note of interest here is that Blacklight::Catalog is not a controller but a concern, which as you may recall is included into the CatalogController (see CatalogController section above.)

You may think that the line include Blacklight::DefaultComponentConfiguration would suggest that blacklight config is defined there, but in fact that file uses blacklight_config. The next place to look in is at Blacklight:Base. That is defined at: blacklight/app/controllers/concerns/blacklight/base.rb

module Blacklight::Base
  extend ActiveSupport::Concern

  include Blacklight::Configurable
  include Blacklight::SearchHelper

  include Blacklight::SearchContext

OK, we are making some progress, what is inside of Blacklight::Configurable? Well, that ends up also being a concern, but this time it's defined under the models directory: blacklight-6.12.0/app/models/concerns/blacklight/configurable.rb

module Blacklight::Configurable
  extend ActiveSupport::Concern

  included do
    helper_method :blacklight_config if respond_to? :helper_method
  end
  
  #instance methods for blacklight_config, so get a deep copy of the class-level config
  def blacklight_config
    @blacklight_config ||= self.class.blacklight_config.deep_copy
  end
  attr_writer :blacklight_config

  module ClassMethods   
    def copy_blacklight_config_from(other_class)
      self.blacklight_config = other_class.blacklight_config.inheritable_copy
    end
    
    # lazy load a deep_copy of superclass if present, else
    # a default_configuration, which will be legacy load or new empty config. 
    # note the @blacklight_config variable is a ruby 'instance method on class
    # object' that won't be automatically available to subclasses, that's why
    # we lazy load to 'inherit' how we want. 
    def blacklight_config
      @blacklight_config ||= if superclass.respond_to?(:blacklight_config)
        superclass.blacklight_config.deep_copy
      else
        default_configuration
      end
    end

Paydirt?

Unfortunately there is nothing obvious in the blacklight_config definition that suggests that we can delegate either of search_builder_class or repository_class, which is what we've been after in this part of the journey. So what gives? Well, we need to follow to where the default_configuration is set.

    # ...
    attr_writer :blacklight_config
    
    #simply a convenience method for blacklight_config.configure
    def configure_blacklight(*args, &block)
      blacklight_config.configure(*args, &block)
    end

    ##
    # The default configuration object
    def default_configuration
      Blacklight::Configurable.default_configuration.inheritable_copy
    end
  end

  def self.default_configuration
      @default_configuration ||= Blacklight::Configuration.new
  end

  def self.default_configuration= config
    @default_configuration = config
  end

The default configuration is set to be Blacklight::Configuration.new. So, that is where we need to go next: blacklight/lib/blacklight.rb

require 'kaminari'
require 'deprecation'
require 'blacklight/utils'
require 'active_support/hash_with_indifferent_access'

module Blacklight
  autoload :AbstractRepository, 'blacklight/abstract_repository'
  autoload :Configuration, 'blacklight/configuration'
  autoload :Exceptions,  'blacklight/exceptions'
  autoload :Parameters,  'blacklight/parameters'
  autoload :Routes,      'blacklight/routes'
  autoload :RuntimeRegistry, 'blacklight/runtime_registry'
  autoload :SearchBuilder, 'blacklight/search_builder'
  autoload :SearchState, 'blacklight/search_state'
  autoload :Solr, 'blacklight/solr'

  extend Deprecation

  require 'blacklight/version'
  require 'blacklight/engine' if defined?(Rails)

OK, we are getting closer according to this line:

  autoload :Configuration, 'blacklight/configuration'

Next we go look in that file at blacklight/lib/blacklight/configuration.rb

And, we found our answer!!!

module Blacklight
  ##
  # Blacklight::Configuration holds the configuration for a Blacklight::Controller, including
  # fields to display, facets to show, sort options, and search fields.
  class Configuration < OpenStructWithHashAccess
    # ...
    # Set up Blacklight::Configuration.default_values to contain
    # the basic, required Blacklight fields
    class << self

    # ...

    def repository_class
      super || Blacklight::Solr::Repository
    end

    # ...
    def search_builder_class
      super || locate_search_builder_class
    end

    def locate_search_builder_class
      ::SearchBuilder
    end

So now we know that the default repository_class is Blacklight::Solr::Repository and that the default search_builder class is ::SearchBuilder.

We'll want to take a look at what the basic query.with(user_params) and repository.search(query) do.

First let's take a look at query.with:

blacklight/lib/blacklight/search_builder.rb

    ##
    # Set the parameters to pass through the processor chain
    def with(blacklight_params = {})
      params_will_change!
      @blacklight_params = blacklight_params.dup
      self
    end

So the most basic version of the query object simply sets the @blacklight_params instance variable to whatever is passed into the with method. That's pretty straight forward. Anything fancier would more likely happen manually or via a passed in block on creation.

Now taking a quick look at the default repository search method:

module Blacklight::Solr
  class Repository < Blacklight::AbstractRepository

    ##
    # Execute a search query against solr
    # @param [Hash] params solr query parameters
    def search params = {}
      send_and_receive blacklight_config.solr_path, params.reverse_merge(qt: blacklight_config.qt)
    end

    ##
    # Execute a solr query
    # @see [RSolr::Client#send_and_receive]
    # @overload find(solr_path, params)
    #   Execute a solr query at the given path with the parameters
    #   @param [String] solr path (defaults to blacklight_config.solr_path)
    #   @param [Hash] parameters for RSolr::Client#send_and_receive
    # @overload find(params)
    #   @param [Hash] parameters for RSolr::Client#send_and_receive
    # @return [Blacklight::Solr::Response] the solr response object
    def send_and_receive(path, solr_params = {})
      benchmark("Solr fetch", level: :debug) do
        key = blacklight_config.http_method == :post ? :data : :params
        res = connection.send_and_receive(path, {key=>solr_params.to_hash, method: blacklight_config.http_method})

        solr_response = blacklight_config.response_model.new(res, solr_params, document_model: blacklight_config.document_model, blacklight_config: blacklight_config)

The are seveal interesting lines here, first let's talk about the abovious one:

        solr_response = blacklight_config.response_model.new(res, solr_params, document_model: blacklight_config.document_model, blacklight_config: blacklight_config)

This line is interesting because we can see that the response model and the document model are both configurable (i.e. we can substitute our own wrapper classes via configuration)

    def response_model
      super || Blacklight::Solr::Response
    end

    def document_model
      super || ::SolrDocument
    end

But the line right before it is just as interesting although indirectly:

        res = connection.send_and_receive(path, {key=>solr_params.to_hash, method: blacklight_config.http_method})

Specifically the bit that says solr_params.to_hash is super important as we find out if we look up the definition of the to_hash method in this case:

    def initialize(*options)
      # ...
      @processor_chain ||= default_processor_chain.dup
      # ...
    end

    def to_hash
      return @params unless params_need_update?
      @params = processed_parameters.
                  reverse_merge(@reverse_merged_params).
                  merge(@merged_params).
                  tap { self.clear_changes }
    end

    alias_method :query, :to_hash
    alias_method :to_h, :to_hash

    # ...
    def processed_parameters
      request.tap do |request_parameters|
        processor_chain.each do |method_name|
          send(method_name, request_parameters)
        end
      end
    end

The above code extraction (edited for clarification) reveals to us the basic pattern that is at the heart of a configurable way to manipulate search attributes prior to making the solr request. We can seee that processed_parameters uses the configurable @processor_chain to mutate the request_parameters, and that to_hash/query/to_h uses the processed_parameters method.

At this point we have enough detail to derive a relatively accurate mental picture of the basic components of a generic blacklight search.

However, we still do not yet have a clear understanding of how blacklight_advanced_search fits into this picture.

Let's go back to our routes configuration file (cofig/routes.rb) to see if we can find some clues.

  mount BlacklightAdvancedSearch::Engine => "/"

This tells us that blacklight_advanced_search is an engine which according to rails docs is a type of plugin that is essentially a mini application that we can mount on our (host) application: http://guides.rubyonrails.org/engines.html

So how does this engine change our applications. Well, to get an idea of what it will do we should take a look at the generator code that is used to add it to our code:

blacklight_advanced_search-6.3.1/lib/generators/blacklight_advanced_search/install_generator.rb

Of those changes the changes that we are most interested in are as follows:

    def inject_search_builder
      inject_into_file 'app/models/search_builder.rb', after: /include Blacklight::Solr::SearchBuilderBehavior.*$/ do
        "\n  include BlacklightAdvancedSearch::AdvancedSearchBuilder" \
        "\n  self.default_processor_chain += [:add_advanced_parse_q_to_solr, :add_advanced_search_to_solr]"
      end
    end

    def install_catalog_controller_mixin
      inject_into_class "app/controllers/catalog_controller.rb", "CatalogController" do
        "  include BlacklightAdvancedSearch::Controller\n"
      end
    end

    def configuration
      inject_into_file 'app/controllers/catalog_controller.rb', after: "configure_blacklight do |config|" do
        "\n    # default advanced config values" \
        "\n    config.advanced_search ||= Blacklight::OpenStructWithHashAccess.new" \
        "\n    # config.advanced_search[:qt] ||= 'advanced'" \
        "\n    config.advanced_search[:url_key] ||= 'advanced'" \
        "\n    config.advanced_search[:query_parser] ||= 'dismax'" \
        "\n    config.advanced_search[:form_solr_parameters] ||= {}\n"
      end

Above we can see that we will be adding two new methods to the processor_chain, remember our discussion earlier that configuring the default_processor_chain is one of the many ways to override the blacklight search behavior and we can see that the blacklight_advanced_search gem is taking advantage of that possibility.

The other changes include mixins and default configuration settings, and those behave as we would expect them to so I wont go into detail about them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment