Skip to content

Instantly share code, notes, and snippets.

View acerb's full-sized avatar

Chris. acerb

  • Brussels, BE
View GitHub Profile
dzuelke /
Created October 22, 2010 22:07 — forked from innerfence/
Convert .mkv video to iPad compatible .m4v without re-encoding
# mkv2m4v inputfile.mkv
# Given an MKV container with H.264 video & AC3 or DTS audio, converts
# quickly to an iPad-compatible MP4 container without re-encoding the
# video (so it must already be in an iPad-compatible resolution); the
# audio is downmixed to stereo with Dynamic Range Compression.
ME=$(basename $0)
karmi /
Created April 8, 2011 17:15
Field notes gathered during installing and configuring ElasticSearch for Website Search: Field Notes

These are field notes gathered during installation of website search facility for the ElasticSearch website.

You may re-use it to put a similar system in place.

The following assumes:

nherment /
Created February 29, 2012 10:42
Backup and restore an Elastic search index (shamelessly copied from
# herein we backup our indexes! this script should run at like 6pm or something, after logstash
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
# compress the data files, create a restore script, and push it all up to S3.
TODAY=`date +"%Y.%m.%d"`
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
YEARMONTH=`date +"%Y-%m"`
karussell / elasticsearch-import-data
Last active October 30, 2023 16:14
ElasticSearch from SQL DB
Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because:
1. You should really consider your own scripts
(be it jvm based, perl, ruby, php, nodejs/javascript)
to feed ElasticSearch via bulk indexing:
2. There are two projects doing it already:
* (keeps DB in synch with ES or solr!)
clintongormley / gist:3888120
Created October 14, 2012 09:44
Upgrading a running elasticsearch cluster

Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.

To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.

Our setup:

  • elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

cd ~
sudo apt-get update
sudo apt-get install openjdk-7-jre-headless -y
# Download the compiled elasticsearch rather than the source.
wget -O elasticsearch.tar.gz
tar -xf elasticsearch.tar.gz
rm elasticsearch.tar.gz
sudo mv elasticsearch-* elasticsearch
sudo mv elasticsearch /usr/local/share
Vagrant.configure("2") do |config| = "precise64"
config.vm.box_url = "" :private_network, ip: ""
config.vm.synced_folder "./", "/vagrant", id: "vagrant-root"
padajo /
Last active April 25, 2023 13:34 — forked from acolyer/
Internet Scale Services Checklist

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

An update by Paul Johnston (, for a Serverless Architecture scenario. This assumes something akin to AWS Lambda + API Gateway + DynamoDB (c. 2016) Function as a Service (FaaS) solution as the basis for deployment rather than a cloud-based virtual server approach which the original paper was based upon. The FaaS solution implies each function is separately scalable and the database is inherently partitioned (assuming designed/built well).

If you agree/disagree, please fork and share with me on twitter @pauldjohnston.

rvanbruggen / 1-importing_from_google_sheet.cql
Last active August 8, 2017 18:38
Importing and querying the web of Belgian Public companies and their ceo's/chairmen
//Importing from the Google Spreadsheet
//import the Person nodes
load csv with headers from
"" as persons
create (n:Node:Person)
set n = persons;
//import the Company nodes
load csv with headers from
"" as companies
shathor /
Last active May 20, 2023 03:32
Performs a fuzzy substring search to find and return a portion of a string that matches with another, considering a max. allowed distance between the two. It is a slightly modified Levenshtein distance. E.g. Looking for 'abcd' in 'xyzabydxyz' and a maximum distance of 1 will return 'abyd'. Useful when trying to fuzzy search in a sentence.
package ch.sgwerder.gist;
import org.junit.jupiter.api.Assertions; // only for testing, see below
import org.junit.jupiter.api.Test; // only for testing, see below
* Copyright (C) 2017 Simon Gwerder.
* <p/>