Skip to content

Instantly share code, notes, and snippets.

View francescosimoneschi's full-sized avatar

Francesco Simoneschi francescosimoneschi

View GitHub Profile
@francescosimoneschi
francescosimoneschi / multiple_input.json
Created September 6, 2013 15:58
Import multiple file input format
{
"urls":[
"s3n://playhaven-segmentation/production2/2013/08/08/15/syslog1-north.device.log.gz",
"s3n://playhaven-segmentation/production2/2013/08/08/16/syslog1-north.device.log.gz"
]
}
@francescosimoneschi
francescosimoneschi / gist:6453960
Created September 5, 2013 18:16
/etc/cassandra/conf/casssandra.yaml
# Cassandra storage config YAML
# NOTE:
# See http://wiki.apache.org/cassandra/StorageConfiguration for
# full explanations of configuration directives
# /NOTE
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'Test Cluster'
@francescosimoneschi
francescosimoneschi / gist:6430096
Created September 3, 2013 21:53
file-reqs.txt
hadoop/segment-api-jobs/org/playhaven/segmentapi/hadoop/jobs/SegmentExpressionWrapper.py
src/segment_api/model/segment_expression.py
@francescosimoneschi
francescosimoneschi / gist:6395218
Created August 30, 2013 23:25
How to init and start hadoop cluster
#From master node, as hduser
# Format hdfs
hadoop namenode -format
# Start data nodes
/usr/sbin/start-dfs.sh
@francescosimoneschi
francescosimoneschi / gist:6392606
Created August 30, 2013 17:59
file: hadoop/conf/hdfs-site.xml dfs.replication: Single node N=1 2 nodes N=2 3 or more N=3
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>NUMBER_OF_REPLICA_BLOCKS</value>
<description>Default block replication.
@francescosimoneschi
francescosimoneschi / gist:6392516
Created August 30, 2013 17:51
file: hadoop/conf/mapred-site.xml Prameters: - mapred.job.tracker : It must be localhost (for single node cluster), or the host of the master node
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
@francescosimoneschi
francescosimoneschi / gist:6392491
Created August 30, 2013 17:49
file: hadoop/conf/core-site.xml Parameters: - hadoop.tmp.dir this is the local path for storing hdfs. It should be point on a large volume. Make sure that this folder can be accessed by the hadoop user: i.e. sudo chown hduser:hadoop - fs.default.name This must be localhost (for single node mode), or the master node. - fs.s3n.awsAccessKeyId S3 ac…
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/large_volume/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
@francescosimoneschi
francescosimoneschi / gist:6392411
Created August 30, 2013 17:40
[only for the master node] file: hadoop/conf/slaves slave1-N are the IPs or hosts of each slave node
localhost
slave1
slave2
slaveN
@francescosimoneschi
francescosimoneschi / Node configuration
Created August 30, 2013 17:38
[only for the master node] file: hadoop/conf/master
localhost
Aug 22 22:00:00 api16-north metrics[29694]: [1377205200] {"game_id": "123456", "device_id": "78910", "preload": "1", "device_token": "dd0fe6d4521239d649773b8935fb4cc62e707d25"}
Aug 22 20:00:00 api16-north metrics[29694]: [1377205200] {"game_id": "123456", "device_id": "78910", "preload": "1", "device_token": "dd0fe6d4521239d649773b8935fb4cc62e707d25"}
Aug 22 19:00:00 api16-north metrics[29694]: [1377205200] {"game_id": "123456", "device_id": "78910", "testfield": "testvalue"}