Skip to content

Instantly share code, notes, and snippets.

@gibrown
Created February 15, 2017 16:48
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gibrown/b039039666e387ed6b0dcefb45203420 to your computer and use it in GitHub Desktop.
Save gibrown/b039039666e387ed6b0dcefb45203420 to your computer and use it in GitHub Desktop.
Populate an Elasticsearch index from bash and json files
#!/bin/bash
# To prep a file for this script:
# - take a list of docs orig.json with one json doc per line
# - run: split -l 1000 orig.json orig-split
export ESINDEX="$1" #ES index name
export ESTYPE="$2" #ES document type name
JSONFILE="$3" #JSON file path name. One doc per line.
export HOST=""
DOCID=1
DOCS=`wc -l $JSONFILE | awk {'print $1'}`
echo "Indexing $DOCS $ESTYPE documents to $ESINDEX in 5 sec"
sleep 5
echo "Prepping bulk data"
rm tmp-bulk/bulk* #cleanup
awk ' {print "{\"index\":{}}"; print;}' $JSONFILE | split -a 4 -l 3000 - tmp-bulk/bulk-
echo "Indexing..."
# we're assuming we aren't worried about losing data and setting consistency to 1 to speed this up
ls tmp-bulk/bulk* | xargs -L1 -I 'FILE' sh -c 'curl --silent -XPOST "http://localhost:9200/$ESINDEX/$ESTYPE/_bulk?consistency=one" --data-binary @FILE -o /dev/null; echo ".";'
#!/bin/bash
INDEX="$1" #ES index name
JSONFILE="$2" #JSON file path name containing the settings for the index
HOST="http://localhost:9200"
DOCID=1
echo "Creating index $INDEX"
curl -XPUT "$HOST/$INDEX" --data-binary @$JSONFILE
echo "Done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment