Skip to content

Instantly share code, notes, and snippets.

View nz's full-sized avatar

Nick Zadrozny nz

View GitHub Profile
@nz
nz / Delete all documents in a Solr index using curl.md
Last active February 12, 2024 10:55
Delete all documents in a Solr index using curl
# http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F
# http://wiki.apache.org/solr/UpdateXmlMessages#Updating_a_Data_Record_via_curl

curl "http://index.websolr.com/solr/a0b1c2d3/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'

I'm amused at the traction this little gist is getting on Google! I would be remiss not to point out that six+ years later I'm still helping thousands of companies on a daily basis with their search index management, by providing managed Solr as a service over at Websolr, and hosted Elasticsearch at Bonsai. Check us out if you'd like an expert helping hand at Solr and Elasticsearch hosting, ops and support!

@nz
nz / elasticsearch-term-frequency.sh
Created December 15, 2014 16:22
In Elasticsearch, how do I get a list of the top 10 most frequently occurring values of a field?
#!/bin/sh
test_document="{
\"text\": \"HEAR the sledges with the bells, / Silver bells! / What a world of merriment their melody foretells! / How they tinkle, tinkle, tinkle, / In the icy air of night! / While the stars, that oversprinkle / All the heavens, seem to twinkle / With a crystalline delight; / Keeping time, time, time, / In a sort of Runic rhyme, / To the tintinnabulation that so musically wells / From the bells, bells, bells, bells, / Bells, bells, bells— / From the jingling and the tinkling of the bells.\"
}"
if curl -fs -X HEAD localhost:9200/top-terms; then
echo "Clear the old test index"
curl -X DELETE localhost:9200/top-terms; echo "\n"
fi
@nz
nz / oauth lite.md
Last active October 3, 2023 07:46
Light weight HMAC token auth over HTTP Basic Auth

HMAC over Basic Auth

This is a pattern I use fairly frequently for administrative APIs. It's a sort of OAuth lite for non-public APIs that produces good quality tokens. Once you build it a few times, it's not any harder than using arbitrary basic auth in your APIs.

The client and the app share a secret, which is never transmitted across the wire. The client uses this secret to create an HMAC digest of a payload consisting of the current time and a random nonce value. The nonce is provided as the Basic Authorization user, and the resulting HMAC digest is provided as the Basic Authorization password.

A similar process is followed on the server side. The server uses the supplied nonce, its own time, and its own copy of the shared secret. It may want to check against several tokens across a small window of times to account for clock drift.

  • Using HMAC means the secret is never transmitted across the wire. Theoretically these are safe across plaintext connections, but you're using TLS anyway, right?
  • The i
@nz
nz / post csv to solr with curl.sh
Created October 21, 2010 22:27
An example of how to POST CSV data to Solr with curl
curl http://index.websolr.com/solr/yourindex/update/csv --data-binary @mydata.csv -H 'Content-type:text/plain; charset=utf-8'
@nz
nz / sunspot_resque.rb
Last active July 10, 2023 21:35
Sunspot with Resque
# app/models/post.rb
class Post
searchable :auto_index => false, :auto_remove => false do
text :title
text :body
end
after_commit :resque_solr_update, :if => :persisted?
@nz
nz / net-http-spy.md
Last active December 13, 2022 14:45

Debugging with net-http-spy

Ruby applications which make HTTP calls to external services such as Websolr or Bonsai Elasticsearch may find it beneficial to use the net-http-spy gem to intercept and log HTTP calls made to that service. This is a very thorough way to log all communication between your application and the external service, which is very helpful when you need to troubleshoot an issue.

Gemfile
gem 'net-http-spy'
@nz
nz / http.rb
Last active October 27, 2021 11:09
Wrapper to RestClient for RESTful JSON APIs, like ElasticSearch. TODO: rebuild with Faraday.
# A light wrapper around RestClient which centralizes a few things for us:
#
# - Light accessors for the method option.
# - JSON by default. If we need others in the future, maybe submodularize.
# - Plug Rails.logger into RestClient.
# - Rather strict one-second open/data timeout. Perhaps to be tweaked.
require 'rest_client'
require 'yajl'
require 'sunspot_ext/resque_session_proxy'
require 'sunspot_ext/resque_index_job'
Sunspot.session = Sunspot::ResqueSessionProxy.new(Sunspot.session)

I really want to love Nix!

The concepts and the architecture are compelling. It resonates strongly with so much of my own values, based on now 20 years of programming experience, and a solid decade of large-scale operational engineering. (I manage large fleets of Solr and Elasticsearch search engines.)

The small amount of play with Nix, and the medium amount of reading I've done are encouraging. I can get some packages installed. I can start a toy nix-shell with some language or other present. I can read a Nix derivation and pretty much follow along with what's happening, although I am far from fluent with writing the Nix language.

But right now I'm hitting a wall when it comes to a more complex real-world use case.

  1. Create a pure and isolated development environment for a Rails app, using Postgres.
  2. Create a pure and isolated development simple Crystal app.
@nz
nz / csv-usage-example.rb
Last active September 3, 2019 17:50
Dynamic time-based batch sizing
elasticsearch_url = ENV.fetch('ELASTICSEARCH_URL', 'http://localhost:9200')
elasticsearch = Elasticsearch::Client.new(url: elasticsearch_url, trace: true)
importer = Importer.new
importer.batch_handler = lambda do |actions|
elasticsearch.bulk(body: actions)
end
importer.start
csv = CSV.new(File.open('data/books.csv', 'r'), headers: true)