Skip to content

Instantly share code, notes, and snippets.

@rotated8
Last active September 25, 2018 17:00
Show Gist options
  • Save rotated8/fa04af143d77910211af7aeb5bda4428 to your computer and use it in GitHub Desktop.
Save rotated8/fa04af143d77910211af7aeb5bda4428 to your computer and use it in GitHub Desktop.
Getting Started with Solr
# Which version of Solr to download. 6.6.2 is known good with Hyrax.
version: 6.6.2
# Download from Princeton, because their mirror is more stable than Apache's, and we have permission.
mirror_url: 'http://lib-solr-mirror.princeton.edu/dist/'
checksum: 'http://lib-solr-mirror.princeton.edu/dist/lucene/solr/6.6.2/solr-6.6.2.zip.sha1'
ignore_checksum: 'true'
# Where to download things.
download_dir: '/var/tmp'
# Where to put Solr. NOTE: This is relative to where you run solr_wrapper!
instance_dir: 'solr-dev'
# Save things when Solr shuts down? Change to `false` if shutting Solr down means wiping out all your work.
persist: true
# What port should the Solr interface show up on? Default is 8983.
port: 8983

Getting Started

This quick guide will get you started with Solr. It is by no means comprehensive, and should not replace reading the documentation

Prereqs

  • Java (java -version must be 1.8.0 or higher.)
  • Ruby (Any version should work, newer is better.)

Installation

If you want to skip this, I have included a Vagrantfile which will set up the environment in an Ubuntu 16_04 box.

Install the solr_wrapper gem by running gem install solr_wrapper. You may need to put sudo in front of that.

Then, copy the .solr_wrapper file below to the folder you want to work in.

The .solrwrapper file configures the version of Solr we use, and how it runs. Right now, we are using version 6.6.2, Solr's UI shows up on port 8983, and data is not deleted between shutdowns.

Running Solr

Start Solr by running solr_wrapper. You should see a download progress bar, if this is your first time running it. When it prints a URL (like http://127.0.0.1:8983/solr/), Solr is running. It will continue to run in the foreground until you use ctrl-c to stop it.

If this is the first time starting Solr, you'll need to create a core. For now, we'll create one named 'dev' by running ./solr-dev/bin/solr create -c 'dev'. Trust me, this is easier than trying to create one through the UI. If you changed the instance_dir from 'solr-dev' in the .solrwrapper file, you'll need to change it the previous command as well.

Interacting with Solr

Now, we can open a browser and go to the URL Solr gave earlier (like http://127.0.0.1:8983/solr/). This will take you to the Solr Dashboard.

In the column on the left side, under the Solr logo, click on Core Admin You will see any cores you have created here. This is a sanity check to ensure the core you created earlier exists.

Back on that left-hand column, there is a dropdown called Core Selector. Click on it, and pick your core. You should now see more options below the selector. Two in particular are useful to us: Documents and Query.

Documents

The Documents page is where you add (or remove) results (documents) to Solr, through the UI. You can change the doument type to XML and copy a single doc from the add_docs.xml file below, or change it to Solr Command, and copy the whole file into the Document(s) box, and click Submit Document. The latter is the same as running curl http://127.0.0.1:8983/solr/dev/update -H "Content-Type: text/xml" --data-binary @./add_docs.xml from the folder add_docs.xml is in.

The add_docs.xml file has three documents in it, which we will search for in the Query section below. Since Overwrite is set to true (both by default in the form, and in the file), adding documents is idempotent, and can be rerun without adverse effects.

To remove documents, change the document type to Solr Command, and copy the contents of the remove_docs.xml into the Documents field, and click Submit Documents. The curl equivalent is curl http://127.0.0.1:8983/solr/dev/update -H "Content-Type: text/xml" --data-binary @./remove_docs.xml. That will delete the test page doc, and any doc with HealthSciences in the library field.

Query

This is where you can build a search. Putting a string into the 'q' field will search all fields for that string. You can restrict a search to a specific field by putting the field name and a colon before the search term, like 'title:Test'. The 'fq' field allows you to filter the documents the search will run against. For example, setting the 'fq' to 'library:General' will only return results where the library field matches General, in addition to whatever the 'q' term is. The 'start' and 'rows' settings allow you to paginate the results, and 'wt' sets the format the results will be returned as.

I absolutely love that the URL equivalent of what you create from the form is shown above the results.

Troubleshooting

The easiest way to recover from errors is to delete the solr-dev forlder, and start from the Running Solr above again. This will delete all the data you had in Solr, though.

<add overwrite='true' commitWithin='5000'>
<doc>
<field name='id'>https://library.emory.edu/index.html</field>
<field name='title'>Emory Libraries Homepage</field>
<field name='keywords'>Hours</field>
<field name='library'>General</field>
<field name='last_update'>2018-08-31</field>
</doc>
<doc>
<field name='id'>https://library.emory.edu/path/to/test_page.html</field>
<field name='title'>Test Page</field>
<field name='keywords'>Test</field>
<field name='keywords'>Page</field>
<field name='library'>General</field>
<field name='last_update'>2018-01-31</field>
</doc>
<doc>
<field name='id'>https://library.emory.edu/health/index.html</field>
<field name='title'>Health Sciences Library Homepage</field>
<field name='keywords'>Hours</field>
<field name='library'>HealthSciences</field>
<field name='last_update'>2017-07-01</field>
</doc>
</add>
<delete>
<id>https://library.emory.edu/path/to/test_page.html</id>
<query>library:HealthSciences</query>
</delete>
$provisioner = <<-SCRIPT
set -o errexit -o nounset -o verbose
sudo apt-get update
sudo apt-get install -y ruby openjdk-8-jre
sudo gem install solr_wrapper
cat >~/.solr_wrapper <<FILE
# Which version of Solr to download. 6.6.2 is known good with Hyrax.
version: 6.6.2
# Download from Princeton, because their mirror is more stable than Apache's, and we have permission.
mirror_url: 'http://lib-solr-mirror.princeton.edu/dist/'
checksum: 'http://lib-solr-mirror.princeton.edu/dist/lucene/solr/6.6.2/solr-6.6.2.zip.sha1'
# Where to download things.
download_dir: '/var/tmp'
# Where to put Solr. NOTE: This is relative to where you run solr_wrapper!
instance_dir: 'solr-dev'
# Save things when Solr shuts down? Change to `false` if shutting Solr down means wiping out all your work.
persist: true
# What port should the Solr interface show up on? Default is 8983.
port: 8983
FILE
cat >~/add_docs.xml <<FILE
<add overwrite='true' commitWithin='5000'>
<doc>
<field name='id'>https://library.emory.edu/index.html</field>
<field name='title'>Emory Libraries Homepage</field>
<field name='keywords'>Hours</field>
<field name='library'>General</field>
<field name='last_update'>2018-08-31</field>
</doc>
<doc>
<field name='id'>https://library.emory.edu/path/to/test_page.html</field>
<field name='title'>Test Page</field>
<field name='keywords'>Test</field>
<field name='keywords'>Page</field>
<field name='library'>General</field>
<field name='last_update'>2018-01-31</field>
</doc>
<doc>
<field name='id'>https://library.emory.edu/health/index.html</field>
<field name='title'>Health Sciences Library Homepage</field>
<field name='keywords'>Hours</field>
<field name='library'>HealthSciences</field>
<field name='last_update'>2017-07-01</field>
</doc>
</add>
FILE
cat >~/remove_docs.xml <<FILE
<delete>
<id>https://library.emory.edu/path/to/test_page.html</id>
<query>library:HealthSciences</query>
</delete>
FILE
SCRIPT
Vagrant.configure("2") do |config|
config.vm.box = "bento/ubuntu-16.04"
config.vm.network "forwarded_port", guest: 8983, host_ip: "127.0.0.1", host: 8983, auto_correct: true
config.vm.provider "virtualbox" do |vb|
vb.memory = "2048"
end
config.vm.provision "shell", privileged: false, inline: $provisioner
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment