Skip to content

Instantly share code, notes, and snippets.

View cloudaice's full-sized avatar
:octocat:

项超 cloudaice

:octocat:
  • Bytedance
  • Hangzhou China
View GitHub Profile
@cloudaice
cloudaice / BeautifulSoup.py
Created January 18, 2013 07:58
分析html用的python包
"""Beautiful Soup
Elixir and Tonic
"The Screen-Scraper's Friend"
http://www.crummy.com/software/BeautifulSoup/
Beautiful Soup parses a (possibly invalid) XML or HTML document into a
tree representation. It provides methods and Pythonic idioms that make
it easy to navigate, search, and modify the tree.
A well-formed XML/HTML document yields a well-formed data
@cloudaice
cloudaice / download_pic.py
Created January 18, 2013 07:55
download_picture
#-*-coding:utf-8-*-
from BeautifulSoup import BeautifulSoup
import requests
from PIL import Image
from StringIO import StringIO
r = requests.get('http://www.xiaomi.com')
assert(r.status_code == 200)
soup = BeautifulSoup(r.text)
urls = soup.findAll('img')
@cloudaice
cloudaice / redis-hash
Created January 6, 2013 06:11
python的redis的hash
import md5
class HashRing(object):
def __init__(self, nodes=None, replicas=3):
"""Manages a hash ring.
`nodes` is a list of objects that have a proper __str__ representation.
`replicas` indicates how many virtual points should be used pr. node,
replicas are required to improve the distribution.
@cloudaice
cloudaice / fabfile.py
Created December 6, 2012 03:02
fabric file
from __future__ import with_statement
import os
from django.core import management
# We have to re-name this to avoid clashes with fabric.api.settings.
import ohbooklist.conf.local.settings as django_settings
management.setup_environ(django_settings)
from fabric.api import *
@cloudaice
cloudaice / BinaryStream.py
Created October 19, 2012 13:33
读取java中的字节流数据
# -*-coding: utf-8 -*-
from struct import *
class BinaryStream:
def __init__(self,base_stream):
self.base_stream = base_stream
self.offset = 0
def readBytes(self,length):
string, = unpack_from(str(length) + 's',self.base_stream[self.offset:])
@cloudaice
cloudaice / pagerank.py
Created October 17, 2012 03:09
pagerank算法
# Copyright (c) 2010 Pedro Matiello <pmatiello@gmail.com>
# Juarez Bochi <jbochi@gmail.com>
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
@cloudaice
cloudaice / pagerank
Created October 16, 2012 01:56
about pagerank
最近我开始学习 Hadoop,本来以为课程应该会更多的侧重如何管理 Hadoop 集群,没想到开始阶段,老师为了让我们更好的理解 Hadoop 的 MapReduce 机制,让我们自己先来实现一个谷歌的 PageRank 算法,本来我想打算使用 Java 来实现的,因为毕竟过段时间,我需要在 Hadoop 集群上部署 Java 代码从而实现数据分析,但我从毕业后就再没用过 Java 写过一行代码,所以我真是写不出来啊,尤其是 PageRank 基本就是矩阵和向量的迭代运算,用 Java 的话一定用到二维数组,我上学的时候学的就不太好。我考虑再三还是决定用 Python 来实现,毕竟上半年的时候自学了一些 Python 语言,而且我知道 Python 有一个第三方模块叫 python-graph,用它来做图论方面的编程容易很多。我是在 Linode VPS 上搭建的 Python 编程环境。相关的模块安装过程如下:
[root@chenjunlu ~]# yum install graphviz*
[root@chenjunlu ~]# yum install vsftpd
[root@chenjunlu ~]# wget http://python-graph.googlecode.com/files/python-graph-core-1.8.2.tar.gz
[root@chenjunlu ~]# tar -zxvf python-graph-core-1.8.2.tar.gz
[root@chenjunlu ~]# cd python-graph-core-1.8.2
[root@chenjunlu ~]# python setup.py install
@cloudaice
cloudaice / numpy.py
Created October 16, 2012 01:53
一个和pagerank相关的算法Algorithm in 126 Lines
# PageRank algorithm
# By Peter Bengtsson
# http://www.peterbe.com/
# mail@peterbe.com
#
# Requires the numarray module
# http://www.stsci.edu/resources/software_hardware/numarray
from numarray import *
import numarray.linear_algebra as la
@cloudaice
cloudaice / how to use diff
Created February 26, 2012 12:41
linux下面的文本比较命令diff使用
一、文本文件比较命令diff
1>diff命令的功能
Linux中diff命令的功能为逐行比较两个文本文件,列出其不同之处。它对给出的文件进行系统的检查,并显示出两个文件中所有不同的行,不要求事先对文件进行排序。
2>语法
diff [options] file1 file2
该命令告诉用户,为了使两个文件file1和file2一致,需要修改它们的哪些行。如果用”-”表示file1或file2,则表示标准输入。如果file1或file2是目录,那么diff将使用该目录中的同名文件进行比较。
var check = require('validator').check,
sanitize = require('validator').sanitize
//Validate
check('test@email.com').len(6, 64).isEmail(); //Methods are chainable
check('abc').isInt(); //Throws 'Invalid integer'
check('abc', 'Please enter a number').isInt(); //Throws 'Please enter a number'
check('abcdefghijklmnopzrtsuvqxyz').is(/^[a-z]+$/);