Skip to content

Instantly share code, notes, and snippets.

@HariSekhon
Created March 22, 2024 01:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save HariSekhon/a5b7c4ef3398e1a05aa2e329a7b691a4 to your computer and use it in GitHub Desktop.
Save HariSekhon/a5b7c4ef3398e1a05aa2e329a7b691a4 to your computer and use it in GitHub Desktop.
solr.md from HariSekhon/Knowledge-Base repo: https://github.com/HariSekhon/Knowlege-Base

Solr

Solr is the classic open-source search system. Perhaps too classic (old).

Latest Documentation

See Elasticsearch too.

Solr UI

Uses an embedded Jetty webapp on port 8983

http://<host>:8983/solr/#/

Solr CLI

If administering Solr a lot, you may find this CLI makes your day to day significantly easier and can use environment variables to shorten your commands.

HariSekhon/DevOps-Perl-tools

git clone https://github.com/HariSekhon/DevOps-Perl-tools.git perl-tools
cd perl-tools
make  # installs CPAN dependencies on all major Linux and Mac systems
./solr_cli.--help

Commercial Offerings

LucidWorks

Start SolrCloud Commands

On first node:

Notice the /solr suffix to the zkHosts to chroot the SolrCloud to the /solr path inside ZooKeeper otherwise it'll write junk all over top level

java -DzkHosts=zk1:2181,zk2:2181,zk3:2181/solr \
  -DnumShards=2 \
  -Dbootstrap_confdir=$SOLR_HOME/example-solrcloud1/solr/collection1/conf \
  -Dcollection.configName=myconf \
  -jar start.jar

on subsequent nodes:

java -DzkHost=zk1:2181,zk2:2181,zk3:2181/solr -jar start.jar

Solr / SolrCloud Docker Images

HariSekhon/Dockerfiles - Solr

HariSekhon/Dockerfiles - SolrCloud

HariSekhon/Dockerfiles - SolrCloud Dev

SolrCloud

Clustered Solr using ZooKeeper for coordination of nodes and shards.

Expect to do shard management, recoveries and ZooKeeper contents investigation.

Strongly recommended that you use Elasticsearch instead of SolrCloud.

The problem with SolrCloud is that clustering was tacked on to classic Solr as an afterthought and it shows, compared to Elasticsearch where this was a primary consideration.

Shard management is difficult in SolrCloud and it's common to have shard outages and shard loss requiring re-indexing from an external data store.

You will need the Solr CLI above. There is even a --request-core-recovery switch that uses an API endpoint that wasn't documented at the time of writing.

  • Leader uses 100 transaction log to sync replicas

  • if replicas fall too far behind leader then full replication of segments is needed instead

  • CDCR - Cross DataCenter Replication

    • https://sematext.com/blog/2016/04/20/solr-6-datacenter-replication/
    • 6.0+
    • stores unlimited update log
    • 'replicator' configured on source collection sends batch updates to target collection(s)
    • shard leader receives indexing command, processes + replicates to local replicas + writes to update log for CDCR (synchronously)
    • async 'replicator' checks update log, if new creates batch + sends to target collection
    • data received by target collection leaders replicated to locally at other DC using std SolrCloud replication
    • 'replicator' batches for max scalability
    • update log only captures new indexed docs
    • indexing existing collection requires shutting down source cluster, copying source leader data dirs to target leader data dirs
    • solr.CdcrRequestHandler, insert update request processor chain in UpdateHandler (see link above for details)
    • Limitations:
      • Active - Passive
      • not bi-directional so no indexing if source cluster goes down (or perhaps but no replication back to original primary)
      • shards must be manually migrated (Elasticsearch auto-migrates shards)
    • start CDCR http://.../<collection>/cdcr?action=START
    • stop CDCR http://.../<collection>/cdcr?action=STOP
    • enable buffering CDCR http://.../<collection>/cdcr?action=ENABLEBUFFER
    • disable buffering CDCR http://.../<collection>/cdcr?action=DISABLEBUFFER
  • Parallel SQL:

    • 6.0+
    • SQL Handler
    • /solr/<collection>/select?q=<traditional_query>
    • /solr/<collection>/sql
    • request body stmt=<sql_query>
    • request body must be urlencoded
    • Solr JDBC driver provided with SolrJ
    • GROUP BY
    • aggregates count / sum / min / max / avg
    • no JOIN

NRT DR cross site recovery down entire cluster, 1 node per shard to ZK at other DC => replicate catch up => down + reconfigure back to local ZK and start again

Local

zkRun uses embedded zookeeper (just for testing):

java -DzkRun -DnumShards=2 -Dbootstrap_confdir=$SOLR_HOME/example-solrcloud1/solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
java -Djetty.port=8984 -DzkHost=localhost:9983 -jar start.jar

Shortcut:

solr -e cloud -noprompt

info:

solr -i
solr stop -c -all

Routing using Murmur hash:

id=<shard>!<id>

Hadoop MapReduce Indexer to SolrCloud

cd $SOLR_HOME

For dry-run to get libs locally, reuse in -libjars for distributed job:

export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$(ls dist/*.jar \
  contrib/map-reduce/lib/*.jar \
  dist/solrj-lib/*.jar \
  contrib/morphline-core/lib/*.jar \
  contrib/morphlines-cell/lib/*.jar \
  contrib/extraction/lib/*.jar \
  example/solr-webapp/webapp/WEB-INF/lib/*.jar |
  tr '\n' ':' |
  sed 's/:$//'
)"
hadoop jar dist/solr-map-reduce-*.jar org.apache.solr.hadoop.MapReduceIndexerTool --libjars ...
hadoop jar dist/solr-map-reduce-*.jar \
  -libjars $(sed 's/:/,/g' <<< "$HADOOP_CLASSPATH") \
  --mappers 12 \
  --morphline-file myFile.conf \
  --morphline-id morphline1 \
  --zk-host $ZOOKEEPERS \
  --collection $SOLR_COLLECTION \
  --go-love --go-live-threads 6 \
  --output-dir hdfs://nameservice1/tmp/blah \
  --verbose \
  --dry-run \
  hdfs://nameservice1/data
  • When not on HDFS (online indexing performance sucks on HDFS), --output hdfs://nameservice1/ causes org.apache.solr.common.SolrException: Directory: org.apache.lucene.store.MMapDirectory. but hdfs lock factory can only be used with HdfsDirectory

  • MRIndexer writes tmp2/full-import-list.txt to dir from only 1 mapper, this causes FileNotFoundException in mapper since they can't see the file when using file:/// - this rules out local index creation

  • URI error was due to specifying MapReduceIndexer path as first arg, must be hdfs:///data/...

MapReduce Indexer Tool

java -cp dist/*:contrib/map-reduce/lib/*:$(hadoop classpath) org.apache.solr.hadoop.MapReduceIndexerTool

HDFS Find Tool

java -cp dist/*:contrib/map-reduce/lib/*:$(hadoop classpath) org.apache.solr.hadoop.HdfsFindTool --help

Morphlines

of note:

  • readMultiLine
  • grok
  • generateUUID
  • convertTimestamp

Monitoring

HariSekhon/Nagios-Plugins check_solr* scripts.

Some more things to monitor:

  • Number of queries per second
  • Solr Write (check_solr_write.pl)
  • Average response time (check_solr_query.pl returns QTime query response time)
  • Number of updates
  • Cache hit ratios
  • Replication status
  • Synthetic queries

Troubleshooting

No Leader

Shards with no filled in circle = no leader

https://solr.apache.org/guide/8_7/shard-management.html#forceleader

Forcing a leader election can lead to data loss:

curl "http://$HOST:8983/solr/admin/collections?action=FORCELEADER&collection=$COLLECTION&shard=$SHARD"

Local Solr Restart

Old script restart_local_solr.sh example:

#!/bin/bash
set -x;
pgrep -l -f start.jar | grep -v grep | awk '{print $2}' | xargs --no-run-if-empty kill;
count=0;
while ps -ef|grep start.ja[r]; do
    let count+=1;
    sleep 3;
    [ $count -ge 5 ] && break;
done;
pgrep -l -f start.jar | grep -v grep | awk '{print $2}' | xargs --no-run-if-empty kill -9;
rm -fv /data*/solr/*/index/write.lock
sleep 1;
cd /opt/solr/hdp; java -Xmx30g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -DzkHost=$SOLR_ZOOKEEPER -jar start.jar &
exit 0

Core Not Coming Up

This file in the data directory needs to exist for core to come up - only created by solr script if dir doesn't already exists

touch /var/solr/data/<core>/core.properties

CorruptIndexException

This is a bug - disable HDFS write cache to work around or switch back to using local disk (faster anyway)

org.apache.solr.common.SolrException; org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual header \d+ vs expected header \d+

org.apache.solr.common.SolrException: Index locked for write

After restart Solr instances, some cores don't load and you this exception.

This happens because of killing Solr instance and leaving <dataDir>/index/write.lock files behind which prevents cores from loading on restart:

org.apache.solr.common.SolrException: Index locked for write for core Blah_shard3_replica2

FIX:

  • stop Solr
  • then run
  • rm /data*/solr/*/write.lock
  • start Solr

Cores not coming back online after restart

Trigger recovery manually (not currently documented, but I've coded it into solr_cli.pl):

/solr/admin/cores?action=REQUESTRECOVERY&core=<name>

ClusterStatus / OverseerStatus 400 Bad Request error - unknown action

CLUSTERSTATUS / OVERSEERSTATUS returns 400 Bad Request "error": { "msg": "Unknown action: CLUSTERSTATUS" }

There was a serial mismatch in logs Solr 4.10.3 vs 4.7.2 rest of cluster.

Misc

  • dfs.replication setting not respected SOLR-6305 and SOLR-6528
  • autoAddReplicas add 4.10 didn't work when tested in 4.10.3
  • Missing authority in path URI when using hdfs:/tmp => needs NN part which is the "authority" => hdfs://nameservice1/tmp
Partial port from private Knowledge Base page 2013+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment