Skip to content

Instantly share code, notes, and snippets.

@rockstarartist
Forked from kalharbi/zookeeper-solr-cloud.md
Last active January 5, 2021 11:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rockstarartist/5098a72a85d0f314e22b4afce3121952 to your computer and use it in GitHub Desktop.
Save rockstarartist/5098a72a85d0f314e22b4afce3121952 to your computer and use it in GitHub Desktop.
Setting up an external Zookeeper Solr Cluster

Setting up an external Zookeeper Solr Cluster on 3 hosts with Ambari's Zookeeper

This is a step by step instruction on how to create a cluster that has three Solr nodes running in cloud mode. These instructions should work on both a local cluster (for testing, with 3 virtual hosts) and a remote cluster where each server runs in its own physical machine. This was tested on Solr version 6.2.1 and Zookeeper version 3.4.6

We will assume that the names of the hosts with the Zookeeper servers will be called: zserver1 zserver2 zserver3

Installing Solr and Zookeeper

  • Download and extract Solr on each machine:
    • curl -O http://mirror.metrocast.net/apache/lucene/solr/6.2.1/solr-6.2.1.tgz
    • mkdir /opt/solr
    • tar -zxvf solr-6.2.1.tgz -C /opt/solr --strip-components=1

Setup Zookeeper for Solr

  • Use Solr installation best practices to keep your zookeeper directory clean:
    • su - zookeeper
    • cd /usr/hdp/current/zookeeper-client/bin/
    • $ ./zkCli.sh -server server1:2181,server2:2181,server3:2181
    • In the zookeeper shell, type the following: create /solr []
    • In the zookeeper shell, type the following to confirm the directory now exists: ls /solr
    • In the zookeeper shell, type the following: quit

Configuring Solr

  • cd /opt/solr
  • Start the three Solr instances on each host and have them point at our Zookeeper instances:
# Notice the /solr on the LAST zk instance,
# this forces solr to save all data in the zkw /solr directory instead of the root directory.
$ ./bin/solr start -c -p 8983 -z zserver1:2181, zserver2:2181, zserver3:2181/solr
$ ./bin/solr start -c -p 8983 -z zserver1:2181, zserver2:2181, zserver3:2181/solr
$ ./bin/solr start -c -p 8983 -z zserver1:2181, zserver2:2181, zserver3:2181/solr
  • Upload our collection configuration to ZooKeeper: You will need to use the zkcli script from the Solr installation and not the zkcli script from the Zookeeper install.
$ ./server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost zserver1:2181/solr \ 
  -confdir ./server/solr/configsets/data_driven_schema_configs/conf/ \
  -confname my-config
  • Create a Solr collection using the uploaded configuration.

    curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=my-colection&numShards=2&replicationFactor=1&collection.configName=my-config'

Notes:

  • If you want to create multiple collections with different schemas, then repeat the last two steps for each collection that uses a different schema. Otherwise, Zookeeper will sync the schema for all collections and you will end up with a single schema for all collections.
  • In Solr, the default maxShardsPerNode is one shard per node. In this setup, we had 3 nodes, so we should not attempt to add more replicas to a collection (e.g., numShards=2 & replicationFactor=2 will result in four shards in total spreaded across three nodes). This would cause a series of errors and crashes since two replicas of the same shard will never be allowed to exist on the same node as per the maxShardsPerNode config setting.
@prynhart
Copy link

Awesome - this is very helpful. (Much more helpful than the official page as it turns out https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble.) . Thanks very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment