Skip to content

Instantly share code, notes, and snippets.

@laserson
laserson / gist:1d1185b412b41057810b
Last active August 29, 2015 14:02
Running custom Spark build on a YARN cluster (for PySpark)

Building Spark for PySpark use on top of YARN

Build Spark on local machine (only if using PySpark; otherwise, remote machine works) (http://spark.apache.org/docs/latest/building-with-maven.html)

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

Copy the assembly/target/scala-2.10/...jar to the corresponding directory on the cluster node and also into a location in HDFS.

@lastland
lastland / BeyesianAvg.py
Created August 11, 2012 07:14
尝试用这篇post: http://www.matrix67.com/blog/archives/5044 中的方法实现的一个自动中文抽词算法的Python程序
# -*- coding=utf-8 -*-
import collections
# Usage:
# 我的做法是把WordsDetector.py里的结果输出到文件,
# 然后把文件名放到下面的names列表中,运行本程序。
names = ['name0',
'name1',
'name2',