Skip to content

Instantly share code, notes, and snippets.

View lucemia's full-sized avatar

David Chen lucemia

  • GliaCloud
  • Taipei / Vancouver
View GitHub Profile
@lucemia
lucemia / gist:6987780
Created October 15, 2013 07:21
要從 local 讀 datastore 的方式
import dev_appserver
dev_appserver.fix_sys_path()
def get_auth():
import getpass
return raw_input('Username:'), getpass.getpass('Password:')
def connect(app_id):
from google.appengine.ext.remote_api import remote_api_stub
@lucemia
lucemia / gist:7019481
Created October 17, 2013 05:19
比 regular expression 更有效濾掉 html tag 的方式
from lxml.html import parse
from lxml import etree
import cStringIO
def remove_tags(html, strip_tags = ["script"]):
b = cStringIO.StringIO(html)
root = parse(b).getroot()
for tag in strip_tags:
for element in root.iter(tag):
element.drop_tree()
@lucemia
lucemia / gist:7026005
Created October 17, 2013 14:30
a simple way to extract chinese +english character
import re
re_pure_text = re.compile(ur'[\u4e00-\u9fff\w]+', re.UNICODE)
@lucemia
lucemia / gist:7052466
Created October 19, 2013 06:58
facebook.py
#!/usr/bin/env python
#
# Copyright 2010 Facebook
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@lucemia
lucemia / gist:7208870
Created October 29, 2013 03:41
how to query id via gql
SELECT * FROM ProductAd where __key__ = Key('ProductAd', 'yahoo:product:2265602')
@lucemia
lucemia / gist:7262390
Created November 1, 2013 08:24
html snippet
<iframe frameBorder="0" scrolling="no" width="300" height="250" marginwidth="0" marginheight="0" style="display: visible" src="http://ad.tagtoo.co/ad_g_300x250?pb=66&id=4#q=http%3A%2F%2Fwww.mayuki.com.tw%2F&p=%%SITE%%&cachebuster=%%CACHEBUSTER%%&click=%%CLICK_URL_ESC%%"></iframe>
@lucemia
lucemia / gist:7315857
Created November 5, 2013 08:53
vip extractor
import sys
def extract_jpg(ifilepath):
ofile = 'test.jpg'
with open(ifilepath, 'rb') as ifile:
icontent = ifile.read()
index = icontent.index("</panorama>") + len("</panorama>")
# print index
icontent = icontent[index:]
@lucemia
lucemia / gist:7366507
Last active December 27, 2015 17:59
python order with prob
import random
x = ['x', 2]
y = ['y', 3]
z = ['z', 4]
SAMPLES = 100000
choices = [x,y,z]
total_weight = float(sum([k[1] for k in choices]))
vs = []
@lucemia
lucemia / gist:7371993
Created November 8, 2013 14:45
a correct version of weighted rank via sort
import random
SAMPLES = 10000
NUM = 5
options = [(k, random.randint(0, 100)) for k in range(NUM)]
choices = list(options)
total_weight = float(sum([k[1] for k in choices]))
vs = []
for i in range(SAMPLES):
@lucemia
lucemia / file_split.py
Created November 10, 2013 15:08
split file for mapreduce
class FileSplitPipe(base_handler.PipelineBase):
def run(self, input_path, output, shards):
# from google.appengine.api import files
# from cStringIO import StringIO
import time
import logging
def readline(_file):
# TODO: Need to fix it