Skip to content

Instantly share code, notes, and snippets.

@qxj
qxj / README.md
Last active May 20, 2017 17:23
使用maven构建spark程序,借助IDEA。

使用maven构建spark程序,借助IDEA。

  1. 建立一个Maven工程,项目模版可以选择 maven-archetype-quickstart
  2. 在项目处右键 Add Framework Support > scala,然后编辑 pom.xml 添加相应依赖;
  3. 在main目录下添加一个scala目录,用于放置scala代码,如下SubmitExample.scala
  4. IDEA内右键Run... 可以测试整个工程是否运行正常;
  5. 真实提交任务到YARN集群,需要去掉.setMaster("local")代码,并用spark-submit提交任务。
from __future__ import division
from __future__ import print_function
"""
------------ Follow The Regularized Leader - Proximal ------------
FTRL-P is an online classification algorithm that combines both L1 and L2
norms, particularly suited for large data sets with extremely high dimensionality.
This implementation follow the algorithm by H. B. McMahan et. al. It minimizes
@qxj
qxj / auc.py
Created January 11, 2017 15:39
"""
Calculate AUC of a sample
Reference:
http://binf.gmu.edu/mmasso/ROC101.pdf
"""
import matplotlib.pyplot as plt
# each element is a tuple(positive, score)
sample = [
(True, 0.63),
@qxj
qxj / AUC.ipynb
Created January 11, 2017 15:36 — forked from CalvinTChi/AUC.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@qxj
qxj / 00-MultipleOutputs
Created January 10, 2017 07:12 — forked from airawat/00-MultipleOutputs
MultipleOutputs sample program - A program that demonstrates how to generate an output file for each key
********************************
Gist
********************************
Motivation
-----------
The typical mapreduce job creates files with the prefix "part-"..and then the "m" or "r" depending
on whether it is a map or a reduce output, and then the part number. There are scenarios where we
may want to create separate files based on criteria-data keys and/or values. Enter the "MultipleOutputs"
functionality.
'''
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
@qxj
qxj / Makefile
Last active December 4, 2016 11:55 — forked from lmullen/Makefile
PDF slides and handouts using Pandoc and Beamer
SLIDES := $(patsubst %.md,%.slides.pdf,$(wildcard *.md))
HANDOUTS := $(patsubst %.md,%.handout.pdf,$(wildcard *.md))
all : $(SLIDES)
%.slides.pdf : %.md
pandoc --latex-engine=xelatex --toc $^ -t beamer --slide-level 2 -o $@
%.handout.pdf : %.md
pandoc $^ -t beamer --slide-level 2 -V handout -o $@
@qxj
qxj / autoexpect
Last active December 17, 2016 09:37
利用expect脚本自动登录跳板机+线上机器
#!/usr/bin/expect
# Name: autoexpect - generate an Expect script from watching a session
#
# Description:
#
# Given a program name, autoexpect will run that program. Otherwise
# autoexpect will start a shell. Interact as desired. When done, exit
# the program or shell. Autoexpect will create a script that reproduces
# your interactions. By default, the script is named script.exp.
# See the man page for more info.
@qxj
qxj / fm_lr.py
Created September 6, 2016 06:30 — forked from kalaidin/fm_lr.py
Logistic regression + Factorization machines + SGD
import numpy as np
from math import exp, log
"""
SGD for logistic loss + factorization machines
The code follows this paper:
[1] http://www.ics.uci.edu/~smyth/courses/cs277/papers/factorization_machines_with_libFM.pdf
"""
def sigmoid(x):
@qxj
qxj / lxc_demo.c
Created July 14, 2016 07:07
Linux namespace learning demo, to better understand docker tech. http://yuedu.baidu.com/ebook/d817967416fc700abb68fca1
/* LXC demo:
* 1. UTC, isolate hostname
* 2. IPC, pipe
* 3. PID, chroot process tree
* 4. NS, mount /proc to make `top` works well
* 5. NET, veth from OpenVZ
*/
#define _GNU_SOURCE