Skip to content

Instantly share code, notes, and snippets.

Engineer by choice, Researcher at heart, Entrepreneur by nature.
Siddhartha has over 4 years of experience working in the broad areas of Information Engineering—Retrieval, Extraction & Management, Machine Learning, Scalability and Cloud Computing; with a focus on application to the World Wide Web.
He started working on Information Retrieval as a research when pursuing a Masters degree in Information Technology at IIIT-Bangalore in 2005,06. After graduation, he joined Ziva Software, a mobile search startup, in 2006. He has since been working on various aspects of developing and running Ziva's flagship mobile search engine Zook as a part of the core 3-member tech team.
Siddhartha's contribution at Ziva has been manifold, including, but not limited to, system architecture for the various sub-systems (crawling, extraction, processing, indexing and search), user experience, API,
Siddhartha blogs on these topics at http://grok.in/ (The Art Of Information Engineering).
require 'whatever/classifier'
classifier = Classifier.new
# training:
classifier.add_document(:text => "blah", :class => :a)
classifier.add_document(:text => "bleh", :class => :b)
# get model; training is automatically finalised
model = classifier.get_model
@sids
sids / html_parser.clj
Created May 6, 2010 05:44
HTML Parsing in Clojure using HtmlCleaner.
(ns in.grok.history.html-parser
(:require [clojure.contrib.logging :as log])
(:import [org.htmlcleaner HtmlCleaner]
[org.apache.commons.lang StringEscapeUtils]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
[page-src]
;;; all code in this function lifted from the clojure-mode function
;;; from clojure-mode.el
(defcustom sids/clojure-mode-font-lock-comment-sexp nil
"Set to non-nil in order to enable font-lock of (comment...)
forms. This option is experimental. Changing this will require a
restart (ie. M-x clojure-mode) of existing clojure mode buffers."
:type 'boolean)
(defconst sids/clojure-font-lock-keywords
;; First fetch CVS version of slime, git version of clojure, swank-clojure, clojure-contrib and clojure-mode
;; Create ~/bin/clojure script which starts clojure repl and adds clojure-contrib src dir and swank-clojure src dir to classpath. I used clj-env helper from clojure-contrib
(pushnew '("\.clj$" . clojure-mode) auto-mode-alist)
(require 'clojure-mode)
;;;; Slime configuration stuff
(setf slime-lisp-implementations
'((ecl("~/bin/ecl" "--heap-size" "1024000000") :coding-system utf-8-unix)
http://search1.flipkart.com/solr/select?spellcheck=true&q=resumen+los+anticonceptivos+explicados+a+los+jovenes&spellcheck.q=resumen+los+anticonceptivos+explicados+a+los+jovenes&spellcheck.count=6&qt=spell_not&spellcheck.collate=true&debugQuery=true&indent=on&echoParams=true
# Redis configuration file example
# Note on units: when memory size is needed, it is possible to specifiy
# it in the usual form of 1k 5GB 4M and so forth:
#
# 1k => 1000 bytes
# 1kb => 1024 bytes
# 1m => 1000000 bytes
# 1mb => 1024*1024 bytes
# 1g => 1000000000 bytes
@sids
sids / gist:1013710
Created June 8, 2011 03:22 — forked from gf3/gist:457702
Regenerate ctags on checkout
#!/bin/sh
# Regenerate ctags on checkout
# project/.git/hooks/post-checkout
DIR=$GIT_DIR
if [ 0 -eq $3 ]; then
# file checkout
else
# tree checkout
@sids
sids / prime_summands.clj
Created November 29, 2011 03:37
Prime Summands
(ns prime-summands.core
(:use [clojure.contrib.lazy-seqs :only #{primes}]))
(defn prime-summands [n]
(let [primes (take-while #(<= % n) primes)]
(reduce +
(for [start (range (count primes))
end (range 1 (inc (count primes)))]
(let [primes (drop start (take end primes))]
(if (= n (reduce + primes))
@sids
sids / prime_summands.clj
Created November 29, 2011 03:46
Prime Summands
(ns prime-summands.core
(:use [clojure.contrib.lazy-seqs :only #{primes}]))
(defn prime-summands [n]
(let [primes (take-while #(<= % n) primes)]
(reduce +
(for [start (range (count primes))]
(let [primes-sums (->> primes
(drop start)
(reductions +)