Skip to content

Instantly share code, notes, and snippets.

@mejackreed
Last active December 24, 2016 17:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mejackreed/96e278aedb42e458dfb8 to your computer and use it in GitHub Desktop.
Save mejackreed/96e278aedb42e458dfb8 to your computer and use it in GitHub Desktop.
Cleaning up GeoNames for Solr
# Reduce the columns
cut -f1-2,5-6 allCountries.txt > allCountries_red.txt
# Add a header row
sed '1s/^/id title_s lat lng\
/g' allCountries_red.txt > allCountries_head.txt
# Add wkt requires csvpys https://github.com/cypreess/csvkit/blob/master/docs/scripts/csvpys.rst
csvpys --tab -s wkt_rpt "'POINT(' + ch['lng'] + ' ' + ch['lat'] + ')'" allCountries_head.txt > allCountries_wkt.txt
# Only keep the columns we need
csvcut -c 1,2,5 allCountries_wkt.txt > allCountries_wkt_cut.txt
# Convert to json
csvjson -i 2 allCountries_wkt_cut.txt > allCountries.json
#Index into solr
curl 'http://localhost:8983/solr/[corename]/update?commit=true' --data-binary @allCountries.json -H 'Content-type:application/json'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment