These are field notes gathered during installation of website search facility for the ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:
#!/bin/bash | |
# | |
# mkv2m4v inputfile.mkv | |
# | |
# Given an MKV container with H.264 video & AC3 or DTS audio, converts | |
# quickly to an iPad-compatible MP4 container without re-encoding the | |
# video (so it must already be in an iPad-compatible resolution); the | |
# audio is downmixed to stereo with Dynamic Range Compression. | |
# | |
ME=$(basename $0) |
These are field notes gathered during installation of website search facility for the ElasticSearch website.
You may re-use it to put a similar system in place.
The following assumes:
#!/bin/bash | |
# herein we backup our indexes! this script should run at like 6pm or something, after logstash | |
# rotates to a new ES index and theres no new data coming in to the old one. we grab metadatas, | |
# compress the data files, create a restore script, and push it all up to S3. | |
TODAY=`date +"%Y.%m.%d"` | |
INDEXNAME="logstash-$TODAY" # this had better match the index name in ES | |
INDEXDIR="/usr/local/elasticsearch/data/logstash/nodes/0/indices/" | |
BACKUPCMD="/usr/local/backupTools/s3cmd --config=/usr/local/backupTools/s3cfg put" | |
BACKUPDIR="/mnt/es-backups/" | |
YEARMONTH=`date +"%Y-%m"` |
Why is there no such DataImportHandler thing in ElasticSearch? Uhm, well ... but because: | |
1. You should really consider your own scripts | |
(be it jvm based, perl, ruby, php, nodejs/javascript) | |
to feed ElasticSearch via bulk indexing: | |
http://www.elasticsearch.org/guide/reference/java-api/bulk.html | |
2. There are two projects doing it already: | |
* http://code.google.com/p/sql-to-nosql-importer/ | |
* https://github.com/Aconex/scrutineer (keeps DB in synch with ES or solr!) |
Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.
To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.
We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.
cd ~ | |
sudo apt-get update | |
sudo apt-get install openjdk-7-jre-headless -y | |
# Download the compiled elasticsearch rather than the source. | |
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.2.tar.gz -O elasticsearch.tar.gz | |
tar -xf elasticsearch.tar.gz | |
rm elasticsearch.tar.gz | |
sudo mv elasticsearch-* elasticsearch | |
sudo mv elasticsearch /usr/local/share |
Vagrant.configure("2") do |config| | |
config.vm.box = "precise64" | |
config.vm.box_url = "http://files.vagrantup.com/precise64.box" | |
config.vm.network :private_network, ip: "192.168.33.101" | |
config.vm.synced_folder "./", "/vagrant", id: "vagrant-root" | |
end |
A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."
An update by Paul Johnston (paul@roundaboutlabs.com), for a Serverless Architecture scenario. This assumes something akin to AWS Lambda + API Gateway + DynamoDB (c. 2016) Function as a Service (FaaS) solution as the basis for deployment rather than a cloud-based virtual server approach which the original paper was based upon. The FaaS solution implies each function is separately scalable and the database is inherently partitioned (assuming designed/built well).
If you agree/disagree, please fork and share with me on twitter @pauldjohnston.
//Importing from the Google Spreadsheet | |
//import the Person nodes | |
load csv with headers from | |
"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=0" as persons | |
create (n:Node:Person) | |
set n = persons; | |
//import the Company nodes | |
load csv with headers from | |
"https://docs.google.com/spreadsheets/d/1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc/export?format=csv&id=1_X628w_2Lx8ZAIPQQUAGhoDTuf31MRxY821E5D3u2Nc&gid=2040965723" as companies |
package ch.sgwerder.gist; | |
import org.junit.jupiter.api.Assertions; // only for testing, see below | |
import org.junit.jupiter.api.Test; // only for testing, see below | |
import java.util.stream.IntStream; | |
/** | |
* Copyright (C) 2017 Simon Gwerder. | |
* <p/> |