A running example of the code from:
- http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang
- http://nesv.github.io/golang/2014/02/25/worker-queues-in-go.html
Small refactorings made to original code:
<!-- a1 (agent) --> | |
<configuration> | |
<property> | |
<name>flume.master.servers</name> | |
<value>$master_IP</value> | |
<description>This is the address for the config servers status server (http)</description> | |
</property> | |
<property> | |
<name>flume.collector.event.host</name> |
################################################################################# | |
# Import modules | |
################################################################################# | |
import os | |
import time | |
import sys | |
import socket | |
import string |
""" | |
The MIT License (MIT) | |
Copyright (c) 2011 Numan Sachwani | |
Permission is hereby granted, free of charge, to any person obtaining a copy of | |
this software and associated documentation files (the "Software"), to deal in | |
the Software without restriction, including without limitation the rights to | |
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies | |
of the Software, and to permit persons to whom the Software is furnished to do | |
so, subject to the following conditions: |
A running example of the code from:
Small refactorings made to original code:
# Configuration file for runtime kernel parameters. | |
# See sysctl.conf(5) for more information. | |
# See also http://www.nateware.com/linux-network-tuning-for-2013.html for | |
# an explanation about some of these parameters, and instructions for | |
# a few other tweaks outside this file. | |
# Protection from SYN flood attack. | |
net.ipv4.tcp_syncookies = 1 |
package main | |
import ( | |
"fmt" | |
"reflect" | |
) | |
// Name of the struct tag used in examples | |
const tagName = "validate" |
#on cluster | |
thrift /spark/sbin/start-thriftserver.sh --master yarn-client | |
#ssh tunnel, direct 10000 to unused 8157 | |
ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com | |
#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools |
## Configure eth0 | |
# | |
# vi /etc/sysconfig/network-scripts/ifcfg-eth0 | |
DEVICE="eth0" | |
NM_CONTROLLED="yes" | |
ONBOOT=yes | |
HWADDR=A4:BA:DB:37:F1:04 | |
TYPE=Ethernet | |
BOOTPROTO=static |
So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.
First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.
At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo
#!/usr/bin/env python | |
""" | |
Very simple HTTP server in python. | |
Usage:: | |
./dummy-web-server.py [<port>] | |
Send a GET request:: | |
curl http://localhost |