Skip to content

Instantly share code, notes, and snippets.

View codeflitting's full-sized avatar
👻
I may be slow to respond.

Pengyu Chen codeflitting

👻
I may be slow to respond.
View GitHub Profile
#!/usr/bin/env python3
#coding: utf-8
import smtplib, argparse, os
from email.mime.text import MIMEText
from email.header import Header
class Config:
def __init__(self):
fn = os.path.join(os.path.abspath(os.path.split(__file__)[0]), "mail.conf")
@codeflitting
codeflitting / CDHTez.md
Created June 29, 2016 03:02 — forked from epiphani/CDHTez.md
Getting Tez enabled on CDH5.4+

So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.

First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.

At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo

// Adding a file directly to index (stage area), without using working tree.
//
// $ go build index-add.go && yes | mv index-add ~/code/workspace/bin
// $ mkdir -p /tmp/sample && cd /tmp/sample
// $ index-add
package main
import (
"bytes"
"fmt"
package main
import (
"flag"
"fmt"
"log"
"os"
"bazil.org/fuse"
"bazil.org/fuse/fs"