Skip to content

Instantly share code, notes, and snippets.

View mrvege's full-sized avatar
🎯
Focusing

sky_limit mrvege

🎯
Focusing
View GitHub Profile
-0.017612 -14.053064 0.0
-1.395634 -4.662541 1.0
-0.752157 -6.53862 0.0
-1.322371 -7.152853 0.0
0.423363 -11.054677 0.0
0.406704 -7.067335 1.0
0.667394 -12.741452 0.0
-2.46015 -6.866805 1.0
0.569411 -9.548755 0.0
-0.026632 -10.427743 0.0
#!/usr/bin/env python
# encoding: utf-8
from numpy import *
import matplotlib.pyplot as plt
def logistic(wTx):
return 1.0/(1.0 + exp(-wTx))
def file2matrix(filename,delimiter):
@mrvege
mrvege / gist:6be2823f21af6dc66381
Created January 27, 2016 05:55 — forked from entaroadun/gist:1653794
Recommendation and Ratings Public Data Sets For Machine Learning

Movies Recommendation:

Music Recommendation:

@mrvege
mrvege / pd_boxplot.py
Created January 13, 2016 09:12
pandas boxplot
import pandas as pd
import numpy as np
%matplotlib inline
fig, ax = plt.subplots()
df = pd.read_csv("author_coauthor_number.csv")
df.boxplot(ax=ax)
plt.savefig("coauthor.png", dpi=400)
@mrvege
mrvege / bisecting.scala
Created December 29, 2015 07:45 — forked from freeman-lab/bisecting.scala
Bisecting k-means for hierarchical clustering in Spark
/**
* bisecting <master> <input> <nNodes> <subIterations>
*
* divisive hierarchical clustering using bisecting k-means
* assumes input is a text file, each row is a data point
* given as numbers separated by spaces
*
*/
import org.apache.spark.SparkContext
@mrvege
mrvege / NAStatCounter.scala
Created December 14, 2015 12:20
using a NAStatCounter to prefilter the NAN in my spark dataset, in order to load the class in spark-shell, you should use command like this (:load [path to your scala file])
/**
* Created by cmy on 12/14/15.
*/
import org.apache.spark.util.StatCounter
class NAStatCounter extends Serializable{
val stats: StatCounter = new StatCounter()
var missing: Long = 0
def add(x: Double): NAStatCounter = {
@mrvege
mrvege / 美团-天平.md
Created December 14, 2015 02:06
美团-天平

#我们用一个等臂天平来称物体的质量,如果我们要称的物体质量范围在1到40克(整数),请问我们最少需要几块砝码可以完成这项物体质量的称量?

  1. 3
  2. 4
  3. 5
  4. 6
  5. 7

##answer: 答案是四个。

@mrvege
mrvege / ipython_pyspark
Created November 26, 2015 07:46
running pyspark in ipython
IPYTHON=ipython IPYTHON_OPTS=qtconsole ./bin/pyspark
@mrvege
mrvege / filterPunct.py
Last active July 10, 2017 07:58
filter punct in English or Chinese
#!/usr/bin/env python
# encoding: utf-8
__author__ = 'dm'
punct = set(u''':!),.:;?]}¢'"、。〉》」』】〕〗〞︰︱︳﹐、﹒
﹔﹕﹖﹗﹚﹜﹞!),.:;?|}︴︶︸︺︼︾﹀﹂﹄﹏、~¢
々‖•·ˇˉ―--′’”([{£¥'"‵〈《「『【〔〖([{£¥〝︵︷︹︻
︽︿﹁﹃﹙﹛﹝({“‘-—_…''')
# 对str/unicode
filterpunt = lambda s : ''.join(filter(lambda x : x not in punct , s))