This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# PageRank algorithm | |
# By Peter Bengtsson | |
# http://www.peterbe.com/ | |
# mail@peterbe.com | |
# | |
# Requires the numarray module | |
# http://www.stsci.edu/resources/software_hardware/numarray | |
from numarray import * | |
import numarray.linear_algebra as la |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
最近我开始学习 Hadoop,本来以为课程应该会更多的侧重如何管理 Hadoop 集群,没想到开始阶段,老师为了让我们更好的理解 Hadoop 的 MapReduce 机制,让我们自己先来实现一个谷歌的 PageRank 算法,本来我想打算使用 Java 来实现的,因为毕竟过段时间,我需要在 Hadoop 集群上部署 Java 代码从而实现数据分析,但我从毕业后就再没用过 Java 写过一行代码,所以我真是写不出来啊,尤其是 PageRank 基本就是矩阵和向量的迭代运算,用 Java 的话一定用到二维数组,我上学的时候学的就不太好。我考虑再三还是决定用 Python 来实现,毕竟上半年的时候自学了一些 Python 语言,而且我知道 Python 有一个第三方模块叫 python-graph,用它来做图论方面的编程容易很多。我是在 Linode VPS 上搭建的 Python 编程环境。相关的模块安装过程如下: | |
[root@chenjunlu ~]# yum install graphviz* | |
[root@chenjunlu ~]# yum install vsftpd | |
[root@chenjunlu ~]# wget http://python-graph.googlecode.com/files/python-graph-core-1.8.2.tar.gz | |
[root@chenjunlu ~]# tar -zxvf python-graph-core-1.8.2.tar.gz | |
[root@chenjunlu ~]# cd python-graph-core-1.8.2 | |
[root@chenjunlu ~]# python setup.py install |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Copyright (c) 2010 Pedro Matiello <pmatiello@gmail.com> | |
# Juarez Bochi <jbochi@gmail.com> | |
# | |
# Permission is hereby granted, free of charge, to any person | |
# obtaining a copy of this software and associated documentation | |
# files (the "Software"), to deal in the Software without | |
# restriction, including without limitation the rights to use, | |
# copy, modify, merge, publish, distribute, sublicense, and/or sell | |
# copies of the Software, and to permit persons to whom the | |
# Software is furnished to do so, subject to the following |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*-coding: utf-8 -*- | |
from struct import * | |
class BinaryStream: | |
def __init__(self,base_stream): | |
self.base_stream = base_stream | |
self.offset = 0 | |
def readBytes(self,length): | |
string, = unpack_from(str(length) + 's',self.base_stream[self.offset:]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import md5 | |
class HashRing(object): | |
def __init__(self, nodes=None, replicas=3): | |
"""Manages a hash ring. | |
`nodes` is a list of objects that have a proper __str__ representation. | |
`replicas` indicates how many virtual points should be used pr. node, | |
replicas are required to improve the distribution. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*-coding:utf-8-*- | |
from BeautifulSoup import BeautifulSoup | |
import requests | |
from PIL import Image | |
from StringIO import StringIO | |
r = requests.get('http://www.xiaomi.com') | |
assert(r.status_code == 200) | |
soup = BeautifulSoup(r.text) | |
urls = soup.findAll('img') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Beautiful Soup | |
Elixir and Tonic | |
"The Screen-Scraper's Friend" | |
http://www.crummy.com/software/BeautifulSoup/ | |
Beautiful Soup parses a (possibly invalid) XML or HTML document into a | |
tree representation. It provides methods and Pythonic idioms that make | |
it easy to navigate, search, and modify the tree. | |
A well-formed XML/HTML document yields a well-formed data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tornado.ioloop | |
import tornado.web | |
import tornado.escape | |
import tornado.options | |
import tornado.httputil | |
import jinja2 | |
import pyjade.compiler | |
import coffeescript | |
import markdown |