Skip to content

Instantly share code, notes, and snippets.

View hezila's full-sized avatar

Feng Wang (Felix) hezila

View GitHub Profile
@hezila
hezila / mysql_5.6&5.7.cnf
Created December 8, 2015 08:53
MySQL 5.6 & 5.7最优配置文件模板
[client]
user=david
password=88888888
[mysqld]
########basic settings########
server-id = 11
port = 3306
user = mysql
bind_address = 10.166.224.32
@hezila
hezila / group_sum.py
Last active September 18, 2015 15:12 — forked from greatghoul/group_sum.py
SqlAlchemy Group By
#!/usr/bin/env python
#-*- coding: utf-8 -*-
from sqlalchemy import create_engine, Column, Integer, String, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class StudentInfo(Base):
@hezila
hezila / kddcup2015_rank.R
Last active August 29, 2015 14:22
KDD Cup 2015 Rank
# from http://qiita.com/Keiku/items/80656e99ca5e71dc1f43
library(rvest)
library(stringr)
library(dplyr)
library(tidyr)
library(reshape2)
library(magrittr)
submission_rank <- html("https://www.kddcup2015.com/submission-rank.html")
@hezila
hezila / xboost.R
Created June 1, 2015 08:33
xboost.R
library(xgboost)
library(Matrix)
# load data
data = read.delim("data/sample.tsv", sep="\t")
data$v6 = NULL
# create data for k-fold cross validation
cv = function(d, k) {
n = sample(nrow(d), nrow(d))
import graphlab as gl
import math
import random
train = gl.SFrame.read_csv('data/train.csv')
test = gl.SFrame.read_csv('data/test.csv')
del train['id']
def make_submission(m, test, filename):
preds = m.predict_topk(test, output_type='probability', k=9)
@hezila
hezila / setup.md
Last active August 29, 2015 14:20 — forked from xrstf/setup.md
Nutch 2.3 + ElasticSearch 1.4 + HBase 0.94 Setup

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

  • Nutch - the crawler (fetches and parses websites)
  • HBase - filesystem storage for Nutch (Hadoop component, basically)
@hezila
hezila / optics.py
Last active August 29, 2015 14:02 — forked from ryangomba/optics.py
import math
import json
################################################################################
# POINT
################################################################################
class Point:
def __init__(self, latitude, longitude):
@hezila
hezila / split_files.sh
Created March 16, 2013 08:55
split multifiles into sub-folders
# from http://ubuntuforums.org/showthread.php?t=976447
let fileCount=200
let dirNum=1
for f in *
do
[ -d $f ] && continue
[ $fileCount -eq 200 ] && {
dir=$(printf "%03d" $dirNum)