Skip to content

Instantly share code, notes, and snippets.

@yhfyhf
yhfyhf / Queue.py
Created March 5, 2016 19:54
Blocking Queue
import threading
class Queue(object):
def __init__(self, max_size):
self.max_size = max_size
self.mutex = threading.Lock()
self.is_full = threading.Condition(self.mutex)
self.is_empty = threading.Condition(self.mutex)
self._queue = []
class ThreadUrl(threading.Thread):
def __init__(self, queue, visited, lock):
super(ThreadUrl, self).__init__()
self.queue = queue
self.visited = visited
self.lock = lock
def run(self):
while True:
# coding: utf-8
# version 1.0.3
# #![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)
# # **Text Analysis and Entity Resolution**
# ####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products.
# #### Entity Resolution, or "[Record linkage][wiki]" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. Our terms with the same meaning include, "entity disambiguation/linking", duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conf
@yhfyhf
yhfyhf / curry.py
Last active August 29, 2015 14:25 — forked from JulienPalard/curry.py
#!/usr/bin/env python
def curry(func):
"""
Decorator to curry a function, typical usage:
>>> @curry
... def foo(a, b, c):
... return a + b + c
@yhfyhf
yhfyhf / ListNode.py
Last active August 29, 2015 14:24
Modified version of ListNode from Leetcode, more friendly for debug.
"""
# generate ListNode from unpacking list
>>> l = l = ListNode(*range(10))
>>> l
<ListNode [0]>
>>> print l
<ListNode [0]>: 0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9
@yhfyhf
yhfyhf / fn_list.py
Created June 22, 2015 12:10
List that supports basic functional programming.
# encoding: utf-8
'''
>>> l = fn_list([3, 2, 1])
>>> l.len()
3
>>> l.map(lambda x: x + 1).map(lambda x: x * 2)
[8, 6, 4]
@yhfyhf
yhfyhf / kNN.py
Created January 30, 2015 08:31
Use kNN algorithm to recognize digits.
# encoding: utf-8
"""
Use kNN algorithm to recognize digits.
Download files here: http://download.csdn.net/detail/zouxy09/6610571
├── digits
│   ├── testDigits
│   └── trainingDigits
@yhfyhf
yhfyhf / kmeans.py
Created January 29, 2015 13:42
K-means algorithm in Python.
import random
import numpy as np
def distance(v1, v2):
"""
euclidean metric of v1, v2.
v1 and v2 are both n-dimensions vectors
"""
return np.sqrt(sum(np.power(v1 - v2, 2)))
@yhfyhf
yhfyhf / LRU.cpp
Created January 18, 2015 14:38
LRU Cache
#include <iostream>
#include <ext/hash_map>
using namespace std;
using namespace __gnu_cxx;
template <class K, class T>
struct Node{
K key;
T data;
Node *prev, *next;