Elie A. eliasah

## es.sh
cd ~
sudo apt-get update
sudo apt-get install openjdk-7-jre-headless -y

### Check http://www.elasticsearch.org/download/ for latest version of ElasticSearch and replace wget link below

# NEW WAY / EASY WAY
# wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
# sudo dpkg -i elasticsearch-1.1.0.deb

## commands.sh
# Find the MD5 sum of the current directory
find . -type f | grep -v "^./.git" | xargs md5 | md5

## fr.sh
#!/bin/bash

ES='http://localhost:9200'
ESIDX='test3'
ESTYPE='test'

curl -XDELETE $ES/$ESIDX

curl -XPUT $ES/$ESIDX/ -d '{
	"settings" : {

## install_scala_sbt.sh
#!/bin/sh
# This script installs Scala 2.10.3 with SBT 0.13 on Ubuntu 12.04

wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
tar zxf scala-2.10.3.tgz sudo mv scala-2.10.3 /usr/local/share/scala

sudo ln -s /usr/local/share/scala/bin/scala /usr/bin/scala
sudo ln -s /usr/local/share/scala/bin/scalac /usr/bin/scalac
sudo ln -s /usr/local/share/scala/bin/fsc /usr/bin/fsc
sudo ln -s /usr/local/share/scala/bin/scaladoc /usr/bin/scaladoc

## add-puppetlabs-repo.sh
wget http://apt.puppetlabs.com/puppetlabs-release-precise.deb
sudo dpkg -i puppetlabs-release-precise.deb
sudo apt-get update

## kibana3-es14-connection-fail-error
Kibana 3 against ElasticSearch 1.4 throws an **Connection Failed** screen. The error text says to set `http.cors.allow-origin`, but it misses out the important `http.cors.enabled: true`

Working config:

    $ grep cors elasticsearch-1.4.0.Beta1/config/elasticsearch.yml
    http.cors.allow-origin: "/.*/"
    http.cors.enabled: true

* [Ref](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html)
* [Ref](http://elasticsearch-users.115913.n3.nabble.com/Kibana-upgrade-trouble-nor-4-0BETA1-neither-3-11-work-now-td4064625.html)

## functions.js
// derived from http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

function map() {
    emit(1, // Or put a GROUP BY key here
         {sum: this.value, // the field you want stats for
          min: this.value,
          max: this.value,
          count:1,
          diff: 0, // M2,n:  sum((val-mean)^2)
    });

## elasticsearch-analysis-french-stopwords
# delete all data
curl -XDELETE localhost:9200/test

# create an index and define specific french stop_words
curl -XPUT localhost:9200/test -d '{
    "settings" : {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "french" : {

## LDA_SparkDocs
/*
This example uses Scala.  Please see the MLlib documentation for a Java example.

Try running this code in the Spark shell.  It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/

import scala.collection.mutable

## Setup.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                eliasah
                / Setup.md
            
            
              Last active
              August 29, 2015 14:20
                — forked from xrstf/setup.md
            
              
                Nutch 2.3 crawler + HBase 0.94 + Elasticsearch 1.4.2
              
          
    Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.
Terms


Nutch - the crawler (fetches and parses websites)
HBase - filesystem storage for Nutch (Hadoop component, basically)
	cd ~
	sudo apt-get update
	sudo apt-get install openjdk-7-jre-headless -y

	### Check http://www.elasticsearch.org/download/ for latest version of ElasticSearch and replace wget link below

	# NEW WAY / EASY WAY
	# wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
	# sudo dpkg -i elasticsearch-1.1.0.deb
	# Find the MD5 sum of the current directory
	find . -type f \| grep -v "^./.git" \| xargs md5 \| md5
	#!/bin/bash

	ES='http://localhost:9200'
	ESIDX='test3'
	ESTYPE='test'

	curl -XDELETE $ES/$ESIDX

	curl -XPUT $ES/$ESIDX/ -d '{
	"settings" : {
	#!/bin/sh
	# This script installs Scala 2.10.3 with SBT 0.13 on Ubuntu 12.04

	wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
	tar zxf scala-2.10.3.tgz sudo mv scala-2.10.3 /usr/local/share/scala

	sudo ln -s /usr/local/share/scala/bin/scala /usr/bin/scala
	sudo ln -s /usr/local/share/scala/bin/scalac /usr/bin/scalac
	sudo ln -s /usr/local/share/scala/bin/fsc /usr/bin/fsc
	sudo ln -s /usr/local/share/scala/bin/scaladoc /usr/bin/scaladoc
	wget http://apt.puppetlabs.com/puppetlabs-release-precise.deb
	sudo dpkg -i puppetlabs-release-precise.deb
	sudo apt-get update
	Kibana 3 against ElasticSearch 1.4 throws an Connection Failed screen. The error text says to set `http.cors.allow-origin`, but it misses out the important `http.cors.enabled: true`

	Working config:

	$ grep cors elasticsearch-1.4.0.Beta1/config/elasticsearch.yml
	http.cors.allow-origin: "/.*/"
	http.cors.enabled: true

	* [Ref](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-http.html)
	* [Ref](http://elasticsearch-users.115913.n3.nabble.com/Kibana-upgrade-trouble-nor-4-0BETA1-neither-3-11-work-now-td4064625.html)
	// derived from http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

	function map() {
	emit(1, // Or put a GROUP BY key here
	{sum: this.value, // the field you want stats for
	min: this.value,
	max: this.value,
	count:1,
	diff: 0, // M2,n: sum((val-mean)^2)
	});
	# delete all data
	curl -XDELETE localhost:9200/test

	# create an index and define specific french stop_words
	curl -XPUT localhost:9200/test -d '{
	"settings" : {
	"index" : {
	"analysis" : {
	"analyzer" : {
	"french" : {
	/*
	This example uses Scala. Please see the MLlib documentation for a Java example.

	Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

	This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
	Spark: http://spark.apache.org/
	*/

	import scala.collection.mutable