Skip to content

Instantly share code, notes, and snippets.

View maheshcr's full-sized avatar

Mahesh CR maheshcr

View GitHub Profile
@sanbornm
sanbornm / gist:177420
Created August 29, 2009 07:14
Simple script to check when your site changes status codes
import pickle, pprint, time, os
import httplib
import smtplib
def emailAlert(alert,subject='You have an alert'):
fromaddr = "youremail@domain.com"
toaddrs = "youremail@domain.com"
# Add the From: and To: headers at the start!
;; in reply to http://www.sids.in/blog/2010/05/06/html-parsing-in-clojure-using-htmlcleaner/
(ns html-parser
(:require [net.cgrand.enlive-html :as e]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
[page-src]
(-> page-src java.io.StringReader. e/html-resource
@plamere
plamere / dedup.py
Created June 25, 2011 13:05
dedup.py - uses echoprint to find duplicates in a music collection
#!/usr/bin/python
import sys
import os
import pprint
import subprocess
import pickle
import atexit
import simplejson as json
sys.path.insert(0, "../API")
@tedpennings
tedpennings / gist:1087981
Created July 17, 2011 19:44
Render Handlebars templates from a server-side resource with caching to session storage
/*
* This decorates Handlebars.js with the ability to load
* templates from an external source, with light caching.
*
* To render a template, pass a closure that will receive the
* template as a function parameter, eg,
* T.render('template-name', function(t) {
* $('#somediv').html( t() );
* });
*/
@quxiaofeng
quxiaofeng / bottle_example.py
Created September 25, 2011 15:53 — forked from Arthraim/bottle_example.py
a python web framework bottle's example
#coding: utf-8
from bottle import route, error, post, get, run, static_file, abort, redirect, response, request, template
@route('/')
@route('/index.html')
def index():
return '<a href="/hello">Go to Hello World page</a>'
@route('/hello')
def hello():
@ibdknox
ibdknox / alephNoir.clj
Created October 2, 2011 19:53
aleph and noir
(require '[noir.server :as server])
(use 'noir.core 'aleph.http 'lamina.core)
(defn async-response [response-channel request]
(enqueue response-channel
{:status 200
:headers {"content-type" "text/plain"}
:body "async response"}))
(defpage "/" [] "hey from Noir!")
@david-torres
david-torres / diffbot.py
Created November 3, 2011 18:13
Simple Python interface for Diffbot API
import requests
import json
class Diffbot(object):
"""
A simple Python interface for the Diffbot api.
Relies on the Requests library - python-requests.org
Usage:
@cpatni
cpatni / app.rb
Created November 21, 2011 22:39
unique calculation using redis
require 'sinatra'
require 'redis'
require 'json'
require 'date'
class String
def &(str)
result = ''
result.force_encoding("BINARY")
@dyross
dyross / futureBad.scala
Created October 2, 2012 03:05
Scaling the Klout API with Scala, Akka, and Play
def blockingAndVerbose: Profile = {
val futureName = name()
val futureScore = score()
val futureFriends = Friends()
val nameResult = Await.result(futureName, 10 seconds)
val scoreResult = Await.result(futureScore, 10 seconds)
val friendsResult = Await.result(futureFriends, 10 seconds)
Profile(nameResult, scoreResult, friendsResult)
@mattb
mattb / gist:3888345
Created October 14, 2012 11:53
Some pointers for Natural Language Processing / Machine Learning

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/