Skip to content

Instantly share code, notes, and snippets.

@thelabdude
Last active September 21, 2017 15:09
Show Gist options
  • Save thelabdude/c3fe78fb7d42e039d4313900360e9e50 to your computer and use it in GitHub Desktop.
Save thelabdude/c3fe78fb7d42e039d4313900360e9e50 to your computer and use it in GitHub Desktop.
Notes for running Solr on Alluxio

Here are some tips on getting started with using Alluxio as the filesystem for Solr indexes. I've tested with Alluxio 1.5.0 and Solr 6.6.0, but these instructions should work for other versions.

SOLR_TIP=<root directory where Solr is installed on your server>
ALLUXIO_HOME=<root directory where Alluxio is installed on your server>

Create an alluxio config directory to load into Solr's ZK with the following settings in solrconfig.xml:

   <directoryFactory name="DirectoryFactory"
-                    class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
+                    class="solr.HdfsDirectoryFactory"/>

...

-    <lockType>${solr.lock.type:native}</lockType>
+    <lockType>hdfs</lockType>

NOTE: I prefer this approach to setting -DsolrDirectoryFactory=solr.HdfsDirectoryFactory so you can have collections that are not using Alluxio. Otherwise, you can add these settings to the SOLR_OPTS in bin/solr.in.sh

Upload the alluxio configset to Solr's ZooKeeper:

cd $SOLR_TIP
bin/solr zk upconfig -n alluxio -d server/solr/configsets/alluxio/conf

Create hadoop-conf/core-site.xml in $SOLR_TIP containing:

<configuration>
<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
  <description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
</property>
<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
</property>
<property>
  <name>fs.alluxio.impl.disable.cache</name>
  <value>true</value>
</property>
<property>
  <name>alluxio.user.file.writetype.default</name>
  <value>CACHE_THROUGH</value>
</property>
<property>
  <name>alluxio.user.file.cache.partially.read.block</name>
  <value>false</value>
</property>
</configuration>

NOTE: Setting alluxio.user.file.cache.partially.read.block to work-around https://alluxio.atlassian.net/browse/ALLUXIO-2995 in Alluxio 1.5.0.

Set start-up options in $SOLR_TIP/bin/solr.in.sh:

SOLR_OPTS="$SOLR_OPTS -Dsolr.hdfs.home=alluxio://master:19998/solr -Dsolr.hdfs.confdir=PATH_TO_SOLR/hadoop-conf"

Add alluxio client JAR to the Solr classpath:

cp $ALLUXIO_HOME/core/client/runtime/target/alluxio-core-client-runtime-1.5.0-jar-with-dependencies.jar $SOLR_TIP/server/solr-webapp/webapp/WEB-INF/lib/

Restart Solr.

bin/solr restart

Create a collection that uses the alluxio configset:

curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=alluxio1&numShards=1&replicationFactor=1&collection.configName=alluxio&property.ulogDir=solr/alluxio1-tlog"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment