Skip to content

Instantly share code, notes, and snippets.

@nz
Created April 8, 2012 01:52
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nz/2333627 to your computer and use it in GitHub Desktop.
Save nz/2333627 to your computer and use it in GitHub Desktop.
Websolr Heroku Devcenter

Websolr is a managed search service provided by Onemorecloud and powered by Apache Solr. The Websolr add-on allows you to use the high performance functionality of Solr in your application today.

Install the Add-on

:::term
$ heroku addons:add websolr

Choosing a Solr Client

The Apache Solr search server presents an API, and there are a number of open source clients to choose from. We recommend Sunspot, although you may already be using another. We provide more general client configuration at the end of this document.

Sunspot for Ruby on Rails

As of this writing, the current release of Sunspot is version 1.3.0. Sunspot provides a Rails plugin as a gem, named sunspot_rails.

Installing Sunspot with Bundler

Rails 3 applications use Bundler by default. If you are developing a Rails 2.3 application, please review Using Bundler with Rails 2.3 to ensure that your application is configured to use Bundler correctly.

Once you have set up your application to use Bundler, add the sunspot_rails gem to your Gemfile.

:::ruby
gem 'sunspot_rails', '~> 1.3.0'

Run bundle install to install Sunspot, and its dependencies, into your local environment.

Configure Sunspot

By default, Sunspot 1.3.0 supports the WEBSOLR_URL environment variable used by your Heroku application in production.

If you would like more fine-grained control over which Solr servers you are using in different environments, you may run script/generate sunspot to create a Sunspot configuration file at config/sunspot.yml.

Using Sunspot

With Sunspot you configure your models for searching and indexing using a Ruby DSL. By default, your records are automatically indexed when they are created and updated, and removed from the index when destroyed.

Indexing Models

Here is a simple example of using Sunspot's searchable block and DSL to configure an ActiveRecord model.

:::ruby
class Post < ActiveRecord::Base
  searchable do
    text    :title
    text    :body
    string  :permalink
    integer :category_id
    time    :published_at
  end
end

To learn more, refer to the following article at the Sunspot wiki:

Searching

To search the model in the above example, you may use something like the following:

:::ruby
@search = Post.search { keywords 'hello' }
@posts  = @search.results

(If your model already defines a search method, you may use the solr_search method instead, for which search is an alias.)

Sunspot exposes the full functionality of Solr. To learn more about searching your models, refer to the following articles at the Sunspot wiki:

Sunspot Rake Tasks

Sunspot provides Rake tasks to start and stop a local Solr server for development and testing. In order to use these Rake tasks, add the following line to your application's Rakefile:

:::ruby
require 'sunspot/rails/tasks'

You may wish to familiarize yourself with the available tasks by running rake -T sunspot.

Running a local Solr server with Sunspot

To start and stop a local Solr server for development, run the following rake tasks:

:::term
rake sunspot:solr:start
rake sunspot:solr:stop

Re-indexing Data with Sunspot

If you are adding Websolr to an application with existing data in your development or production environment, you will need to "re-index" your data. Likewise, if you make changes to a model's searchable configuration, or change your index's configuration at the Websolr control panel, you will need to reindex for your changes to take effect.

In order to reindex your production data, you may run a command similar to the following from your application's directory:

:::term
heroku rake sunspot:reindex

If you are indexing a large number of documents, or your models us a lot of memory, you may need to reindex in batches smaller than Sunspot's default of 50. We recommend starting small and gradually experimenting to find the best results. To reindex with a batch size of 10, use the following:

:::term
heroku rake sunspot:reindex[10]

Refer to rake -T sunspot to see the usage for the reindex task.

Updating Asynchronously with Heroku Workers

Queuing your updates to Solr is a perfect job for Heroku's Delayed Job Workers. Sending updates to Solr has the advantage of increasing your application's performance and robustness. Simply add the following lines to your model after the searchable block:

:::ruby
handle_asynchronously :solr_index
handle_asynchronously :remove_from_index

Resque users should consult this gist: https://gist.github.com/1282013

Haystack for Django

If your application is using Django, you can use the Haystack Solr client. Once you have set up your application as per their official getting started tutorial, you should modify your application's settings.py to use these settings:

:::python
HAYSTACK_URL      = os.environ.get('WEBSOLR_URL', '')
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': HAYSTACK_URL,
    },
}

When you are ready to deploy to Heroku, use the following command to generate your Solr schema.xml, to be uploaded to your Websolr index:

:::term
./manage.py build_solr_schema > schema.xml

Copy the contents of the schema.xml file and open the Websolr addon dashboard:

:::term
heroku addons:open websolr

Select your index, and select the "Advanced" tab to paste in the contents of your schema.xml. Your index will take a minute or two to reconfigure itself, and then you can run the following command to reindex your data:

:::term
heroku run myproject/manage.py rebuild_index

Using a Different Solr Client

There are other Solr clients, including the venerable but still popular acts_as_solr. If you are already using one of these clients and are not interested in switching your application to Sunspot, here are a few pointers for using Websolr in production.

Your index's URL is set in the WEBSOLR_URL environment variable. If your Solr client can be configured at runtime, we recommend creating an initializer file (such as config/initializer/websolr.rb in Rails) in which you instruct your client to connect to ENV['WEBSOLR_URL'] when present.

Alternatively, you may run heroku config from your application's directory to view the value for WEBSOLR_URL and manually hard-code the relevant configuration file for your particular Solr client.

Configuring your index

When your index is first created, it will be automatically configured using the schema.xml for the latest version of Sunspot, which is a very flexible schema that can cover a lot of uses.

Websolr provides a control panel at http://websolr.com/ where you may make changes to your index, such as adding or removing different Solr features, selecting a different Solr client, providing your own schema.xml and so on.

Questions?

If you are experiencing a problem with installing or using the Websolr add-on, you may visit http://help.websolr.com/ or http://support.heroku.com/ for assistance. Please provide your index URL and, if possible, a reproduction of the error using curl.

Websolr is a popular service that receives many questions. We love to answer general questions about Solr integration, but need to prioritize support questions directly related to our service. If you have general questions about implementing various search features, you may first want to try their relevant public forums.

If you have suggestions for our docs, we welcome comments here: https://gist.github.com/2333627.

@rgarver
Copy link

rgarver commented Aug 29, 2012

Under the "Configure Sunspot" section you mention that you can generate a yml file but don't explain if there are any modifications that need to be made to the default sunspot.yml to make it work nicely on heroku (eg: "hostname: <%= ENV["WEBSOLR_URL"] %>").

One thing I'm unclear on is if the URL would have the port built in like localhost:8983 or if it's just the URL part. If so do you also set a port environment variable. I haven't set this up on my account yet, and I'm sure much will be made clear once I start turning things on, but this is something that could be explained here as well. Hope this helps!

@jnga
Copy link

jnga commented Mar 4, 2016

I am not sure, but the Django rebuild_index command "heroku run myproject/manage.py rebuild_index" may be missing 'python'. Perhaps it should be "heroku run python myproject/manage.py rebuild_index". The latter works for me, the former gives a Permission Denied error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment