Skip to content

Instantly share code, notes, and snippets.

View marians's full-sized avatar

Marian Steinbach marians

View GitHub Profile
@marians
marians / db.sql
Created February 20, 2012 11:11
Collecting tweets mentioning given keywords, storing the result to a MySQL table
CREATE TABLE `tweets` (
`id` varchar(24) NOT NULL DEFAULT '',
`created_at` datetime NOT NULL,
`user_id` bigint(20) unsigned NOT NULL,
`user_name` varchar(128) NOT NULL DEFAULT '',
`user_followers` int(11) unsigned NOT NULL,
`user_friends` int(10) unsigned DEFAULT NULL,
`user_listed` int(10) unsigned DEFAULT NULL,
`user_statuses` int(10) unsigned DEFAULT NULL,
`user_location` varchar(100) DEFAULT NULL,
Cache-Control: public, max-age=43200
Connection: keep-alive
Content-Length: 81443
Content-Type: text/css; charset=utf-8
Date: Sat, 28 Apr 2012 22:37:56 GMT
ETag: "flask-1335630896.86-81443-1985555092"
Expires: Sun, 29 Apr 2012 10:37:56 GMT
Last-Modified: Sat, 28 Apr 2012 16:34:56 GMT
Server: nginx/0.7.65
Set-Cookie: session="IZK844h3a4CHUU02CIjgyc08RHM=?lang=Vml0CnAxCi4="; Path=/; HttpOnly
@marians
marians / Mapping
Created October 17, 2012 10:26
ElasticSearch - documents with multiple geo_point properties
{
spatialtest: {
document: {
properties: {
location: {
type: "geo_point"
},
title: {
type: "string"
}
@marians
marians / webserver.py
Created October 24, 2012 09:53
A simple development webserver for the console that features throttling (if enabled by -t) and ignores query strings
# encoding: utf-8
"""
A simple development webserver for the console that features
throttling (if enabled by -t) and ignores query strings.
"""
import SimpleHTTPServer
import SocketServer
import os
@marians
marians / save_tweets.py
Created March 12, 2013 13:20
Save tweets containing certain keywords from the twitter Straming API to MongoDB
import tweetstream
from pymongo import MongoClient
# look for these words:
WORDS = ['word1', 'word2']
TWITTER_USER = ""
TWITTER_PASS = ""
MONGO_DB = 'tweetstream'
@marians
marians / lang_detection_server.pl
Created March 19, 2013 12:41
This is an HTTP service for natural language guessing of input texts. Run it as "perl lang_detection_server.pl" and open a URL like http://localhost:8080/?text=This+is+just+a+test+string . TextCat source code courtesy of Gertjan van Noord.
# This is a sloppy HTTP server version of TextCat, an n-gram based natural language guesser
# written by Gertjan van Noord in 1997.
# More info: http://odur.let.rug.nl/~vannoord/TextCat/
#
# TextCat was distributed under the GNU Lesser General Public License.
#
# You need the language model files (LM folder from Gertjan's distribution) in a directory.
# Set the variable $opt_d to point to that directory.
@marians
marians / hadoop-hadoop-datanode-Marians-MBP.local.log
Last active December 17, 2015 14:09
Hadoop problem logs as of 2013-05-21
2013-05-21 21:32:20,681 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:50144, bytes: 1777, op: HDFS_READ, cliID: DFSClient_attempt_201305152304_0004_m_000000_1_722979034_1, offset: 0, srvID: DS-2043951618-192.168.0.102-50010-1368635777558, blockid: blk_-4766979604280382827_1040, duration: 240000
2013-05-21 21:32:20,804 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:50145, bytes: 7517, op: HDFS_READ, cliID: DFSClient_attempt_201305152304_0004_m_000000_1_722979034_1, offset: 0, srvID: DS-2043951618-192.168.0.102-50010-1368635777558, blockid: blk_3289312915423223722_1003, duration: 736000
2013-05-21 21:35:59,387 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-4719156887958776590_1064 src: /127.0.0.1:50215 dest: /127.0.0.1:50010
2013-05-21 21:35:59,486 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50215, dest: /127.0.0.1:50010, bytes:
@marians
marians / queue.py
Created May 24, 2013 14:33
Untested version of a job queue that relies on MongoDB
"""
Untested version of some job queue
Usage:
from pymongo import MongoClient
db = MongoClient()
queue = Queue("myqueue", db)
job = {
'key': 'foobar',
@marians
marians / bench.py
Last active March 30, 2021 14:30
Benchmarking serialization/unserialization in python using json, pickle and cPickle
import cPickle
import pickle
import json
import random
from time import time
from hashlib import md5
test_runs = 1000
def float_list():
@marians
marians / test.py
Last active July 10, 2017 23:47
Using Tor Browser Bundle for anonymous HTTP requests in Python - supplement for http://www.sendung.de/2014-09-16/anonymous-scraping-via-python-tor/
import socket
import socks # pip install PySocks - https://github.com/Anorov/PySocks
# configure default proxy. 9150 is the Tor Browser Bundle socks proxy port
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9150)
socket.socket = socks.socksocket
import urllib
print(urllib.urlopen('http://icanhazip.com').read())