Chris. acerb

## mkv2m4v.sh
#!/bin/bash
#
# mkv2m4v inputfile.mkv
#
# Given an MKV container with H.264 video & AC3 or DTS audio, converts
# quickly to an iPad-compatible MP4 container without re-encoding the
# video (so it must already be in an iPad-compatible resolution); the
# audio is downmixed to stereo with Dynamic Range Compression.
#
ME=$(basename $0)

## ElasticSearch.org.Website.Search.FieldNotes.markdown

      
              1 file
            
          
              9 forks
            
          
              2 comments
            
          
              48 stars
            
          
                karmi
                / ElasticSearch.org.Website.Search.FieldNotes.markdown
            
            
              Created
              April 8, 2011 17:15
            
              
                Field notes gathered during installing and configuring ElasticSearch for http://elasticsearch.org
              
          
    ElasticSearch.org Website Search: Field Notes

These are field notes gathered during installation of website search facility for the
ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:

  
## backup.sh
#!/bin/bash
# herein we backup our indexes! this script should run at like 6pm or something, after logstash
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
# compress the data files, create a restore script, and push it all up to S3.
TODAY=`date +"%Y.%m.%d"`
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
BACKUPDIR="/mnt/es-backups/"
YEARMONTH=`date +"%Y-%m"`

## elasticsearch-import-data
Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because:

1. You should really consider your own scripts
(be it jvm based, perl, ruby, php, nodejs/javascript)
to feed ElasticSearch via bulk indexing:
http://www.elasticsearch.org/guide/reference/java-api/bulk.html

2. There are two projects doing it already:
 * http://code.google.com/p/sql-to-nosql-importer/
 * https://github.com/Aconex/scrutineer (keeps DB in synch with ES or solr!)

## gist:3888120

      
              1 file
            
          
              9 forks
            
          
              0 comments
            
          
              36 stars
            
          
                clintongormley
                / gist:3888120
            
            
              Created
              October 14, 2012 09:44
            
              
                Upgrading a running elasticsearch cluster
              
          
    Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.
To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.
Our setup:


elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

  
## es.sh
cd ~
sudo apt-get update
sudo apt-get install openjdk-7-jre-headless -y

# Download the compiled elasticsearch rather than the source.
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz -O elasticsearch.tar.gz
tar -xf elasticsearch.tar.gz
rm elasticsearch.tar.gz
sudo mv elasticsearch-* elasticsearch
sudo mv elasticsearch /usr/local/share

## Vagrantfile
Vagrant.configure("2") do |config|

  config.vm.box = "precise64"
  config.vm.box_url = "http://files.vagrantup.com/precise64.box"

  config.vm.network :private_network, ip: "192.168.33.101"

  config.vm.synced_folder "./", "/vagrant", id: "vagrant-root"

end

## service-checklist.md

      
              1 file
            
          
              3 forks
            
          
              0 comments
            
          
              21 stars
            
          
                padajo
                / service-checklist.md
            
            
              Last active
              April 25, 2023 13:34
                — forked from acolyer/service-checklist.md
            
              
                Internet Scale Services Checklist
              
          
    Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

http://mvdirona.com/jrh/talksandpapers/jamesrh_lisa.pdf

An update by Paul Johnston (paul@roundaboutlabs.com), for a Serverless Architecture scenario. This assumes something akin to AWS Lambda + API Gateway + DynamoDB (c. 2016) Function as a Service (FaaS) solution as the basis for deployment rather than a cloud-based virtual server approach which the original paper was based upon. The FaaS solution implies each function is separately scalable and the database is inherently partitioned (assuming designed/built well).
If you agree/disagree, please fork and share with me on twitter @pauldjohnston.

  
## 1-importing_from_google_sheet.cql
//Importing from the Google Spreadsheet
//import the Person nodes
load csv with headers from
"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=0" as persons
create (n:Node:Person)
set n = persons;

//import the Company nodes
load csv with headers from
"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=2040965723" as companies

## FuzzySubstringSearch.java
package ch.sgwerder.gist;

import org.junit.jupiter.api.Assertions; // only for testing, see below
import org.junit.jupiter.api.Test; // only for testing, see below

import java.util.stream.IntStream;

/**
 * Copyright (C) 2017 Simon Gwerder.
 * <p/>
	#!/bin/bash
	#
	# mkv2m4v inputfile.mkv
	#
	# Given an MKV container with H.264 video & AC3 or DTS audio, converts
	# quickly to an iPad-compatible MP4 container without re-encoding the
	# video (so it must already be in an iPad-compatible resolution); the
	# audio is downmixed to stereo with Dynamic Range Compression.
	#
	ME=$(basename $0)
	#!/bin/bash
	# herein we backup our indexes! this script should run at like 6pm or something, after logstash
	# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas,
	# compress the data files, create a restore script, and push it all up to S3.
	TODAY=`date +"%Y.%m.%d"`
	INDEXNAME="logstash-$TODAY" # this had better match the index name in ES
	INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/"
	BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put"
	BACKUPDIR="/mnt/es-backups/"
	YEARMONTH=`date +"%Y-%m"`
	Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because:

	1. You should really consider your own scripts
	(be it jvm based, perl, ruby, php, nodejs/javascript)
	to feed ElasticSearch via bulk indexing:
	http://www.elasticsearch.org/guide/reference/java-api/bulk.html

	2. There are two projects doing it already:
	* http://code.google.com/p/sql-to-nosql-importer/
	* https://github.com/Aconex/scrutineer (keeps DB in synch with ES or solr!)
	cd ~
	sudo apt-get update
	sudo apt-get install openjdk-7-jre-headless -y

	# Download the compiled elasticsearch rather than the source.
	wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz -O elasticsearch.tar.gz
	tar -xf elasticsearch.tar.gz
	rm elasticsearch.tar.gz
	sudo mv elasticsearch-* elasticsearch
	sudo mv elasticsearch /usr/local/share
	Vagrant.configure("2") do \|config\|

	config.vm.box = "precise64"
	config.vm.box_url = "http://files.vagrantup.com/precise64.box"

	config.vm.network :private_network, ip: "192.168.33.101"

	config.vm.synced_folder "./", "/vagrant", id: "vagrant-root"

	end
	//Importing from the Google Spreadsheet
	//import the Person nodes
	load csv with headers from
	"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=0" as persons
	create (n:Node:Person)
	set n = persons;

	//import the Company nodes
	load csv with headers from
	"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=2040965723" as companies
	package ch.sgwerder.gist;

	import org.junit.jupiter.api.Assertions; // only for testing, see below
	import org.junit.jupiter.api.Test; // only for testing, see below

	import java.util.stream.IntStream;

	/**
	* Copyright (C) 2017 Simon Gwerder.
	* <p/>