Skip to content

Instantly share code, notes, and snippets.

@eribeiro
Last active January 29, 2020 22:51
Show Gist options
  • Save eribeiro/de1588aaa1759c02ea40cc281e8aedc8 to your computer and use it in GitHub Desktop.
Save eribeiro/de1588aaa1759c02ea40cc281e8aedc8 to your computer and use it in GitHub Desktop.
Retrieves all the documents from a Solr core/collection
#!/bin/bash
## Usage:
## $ chmod +x fetcher.sh
## $fetcher.sh <output file>
SOLR_URL="http://localhost:8983/solr"
COLLECTION="teste"
ROWS=10
FL=*,score # No space around commas
CURSORMARK=*
Q=*:*
SORT=id%20desc
NEXT_CURSORMARK=
FILENAME=$1
cat /dev/null > $FILENAME ## truncante file content, if file exists
echo "[" >> $FILENAME ## open bracket so file content is a valid json (list of lists of records)
counter=0
while [[ True ]]; do
url="$SOLR_URL/$COLLECTION/select?q=$Q&rows=$ROWS&fl=$FL&cursorMark=$CURSORMARK&sort=$SORT"
resp=$(curl -s "$url")
## jq '.' <<< "$resp"
NEXT_CURSORMARK=$(jq '.nextCursorMark' <<< "$resp")
NEXT_CURSORMARK=$(echo $NEXT_CURSORMARK | sed -e 's/\"//g')
docs=$(jq '.response.docs' <<< "$resp")
num_docs=$(echo $docs | jq '. | length')
## echo $docs
counter=$((counter + num_docs))
echo $docs >> $FILENAME
if [[ "$CURSORMARK" == "$NEXT_CURSORMARK" ]]; then
echo "]" >> $FILENAME ## make content a valid json file
# echo "Num docs: "$counter
echo "Finished."
exit
else
echo "," >> $FILENAME ## make content a valid json file
fi
CURSORMARK=$NEXT_CURSORMARK
# sleep 1 ## optional, sleep a bit before fetching the next page
done;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment