Created
September 16, 2011 23:26
-
-
Save wgpshashank/1223394 to your computer and use it in GitHub Desktop.
Snippets of code I've written recently to various programming challenges
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'''Problem - A 7 digit number consists of 7 distinct digits (from 0..9 ofcourse); The product of the first 3 digits = product of central 3 digits = product of last 3 digits; Find the middle digit. | |
''' | |
'''Used Constraint satisfaction to solve the problem - requires http://labix.org/download/python-constraint/python-constraint-1.1.tar.bz2''' | |
from constraint import * | |
def main(): | |
""" | |
My explanation here, let's first make their numbers be a through g, we have abc=cde=efg. | |
First of all, I'll rule out 0, for one thing, including zero makes anything multiplied by it 0. Moreover zero can be shared inside only two triplets. So, it's out. | |
I'm going to rule out 5 or 7 , because they are prime and no other 1-digit number contains their multiple. So, the remaining numbers are 1, 2, 3, 4, 6, 8, 9. | |
To determine the sequence lets find the prime factors of above numbers: | |
2 = 2 | |
3 = 3 | |
4 = 2 * 2 | |
6 = 2 * 3 | |
8 = 2 * 2 * 2 | |
9 = 3 * 3 | |
So we have seven 2's and four 3's. Let's go through the cases where the following could be the middle number. | |
1: If 1 is the middle digit, then we have seven 2's to be distributed among abc and efg. Because, the number of 2's is not even, we can't have 1 as middle digit. | |
3,6,9: If we have 3 or 6 in middle, we are left with three 3's in other digit which can't balance number of 3's in left and right group. | |
If we have 9 in middle then we are left with only two 3's in other digits which can only balance one group with middle group. | |
So 3, 6, and 9 are not suitable for middle digit. | |
4,8: If we have 4 in middle then we are left with five 2's in other digits which can't balance number of 2's in left and right group. | |
If we have 8 in middle then we are left with only four 2's in other digits which are not sufficient to balance the other two group with middle group. | |
So 4 and 8 are not suitable for middle digit. | |
Since abc=efg, (abcdefg)/d must be a perfect square. 1*2*3*4*6*8*9 should have seven 2's and 4 3's and d can be only 2 or 8. | |
If d=2, the numbers 1, 3, 4, 6, 8, 9 can be made into 1*8*9=3*4*6=72, and we can have c=9, e=4: 9*2*4=72. | |
If d=8, the numbers 1, 2, 3, 4, 6, 9 can be made into 1*4*9=2*3*6=36, but 36 is not divisible by 8. | |
When we have 2 as middle digit we remain with six 2's and four 3's which can be evenly distributed among left and right group and can balance the equation. | |
When the 2's and 3's are evenly distributed we have three 2's and two 3's on each group making 2 * 2 * 2 * 3 * 3 = 72 like we found out above | |
We already have 2 as middle digit so for middle group to make 71 the other two digit of middle group should multiply to make 36 and the only possible answer is 4 * 9. | |
So the other two digit of middle group are 4 and 9. | |
Lets consider 4 is shared with left group and 9 is shared with right group is the sequence for middle group is "4, 2, 9". | |
Now for the left group to make 72 the other two digit should multiply to make 18 and with the digit 1, 3, 6, and 8 left the only possible answer is 3 and 8. | |
And the digit 8 and 1 become part of right group with 9 and make 72 as the multiplication. And thus the sequence must be | |
3 6 4 2 9 8 1 | |
Ultimately, it's a matter of constraint satisfaction: i.e. a*b = d*e and c*d = f*g. I'll use a constaint SAT solver to find our solution :-) | |
""" | |
problem = Problem() | |
problem.addConstraint(AllDifferentConstraint()) | |
problem.addVariables(["a","b","c","d","e","f","g"],[1,2,3,4,5,6,7,8,9]) | |
problem.addConstraint(lambda c,d,f,g: c*d == f*g, ("c","d","f","g")) | |
problem.addConstraint(lambda a,b,d,e: a*b == d*e, ("a","b","d","e")) | |
solutions=problem.getSolutions() | |
print solutions | |
if __name__ == "__main__": | |
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os,sys,random | |
'''Problem - . Given a list of N line segments on the X axis: (x1[i], x2[i]) where i = 1 to N; And then a list of M queries. Each query inputs one x-coordinate and you need to output the set of all {i} that overlaps this point. | |
example: | |
N = 2 | |
3 6 | |
1 5 | |
M = 2 | |
query: 3 | |
answer: 1,2 | |
query: 6 | |
answer: 1 | |
My Solution: | |
I used an AVL tree for cheap insertions, self-balancing and search under O(log n) | |
this AVL Tree comes with some augmentation for keeping track of maximum right range found in a node's children | |
each node contains an array of four details - [x1, x2, maximum right range found in child (max_right), parent_node] | |
each node is inserted by comparing x1. After node insertion as a leaf, node's max_right is set initially to x2 and is then propogated up to the root O(height of tree) | |
querying is pretty easy and cheap. We first check to see if q is contained in root r - if yes, we're done. | |
If not, then, we check r.left_bound >= q, if it is, we go left and recursively do the same trick until we find q. If r.left_bound <= q, we have two cases. | |
case 1: if r.left.max >= q, recursively go left. | |
case 2: if r.left.max <= q, recursively go right. | |
querying should take O(log n) as it's simply a matter of Binary search :-) | |
''' | |
found=0 | |
class Node: | |
def __init__(self, data): | |
self.data = data | |
self.parent = None | |
self.set_childs(None, None) | |
def set_childs(self, left, right): | |
self.left = left | |
if self.left: | |
self.left.parent=self | |
self.right = right | |
if self.right: | |
self.right.parent=self | |
def findMax(self,i): #if i, return max of children's nodes | |
#if self.parent: | |
#print self.data,"||",self.parent.data | |
if self.left == None and self.right == None: | |
self.data[2]=self.data[1] | |
if i: | |
return self.data[1] | |
elif self.left and self.right: | |
self.data[2] = max(self.data[2], max(self.left.data[2], self.right.data[2])) | |
if i: | |
return max(self.left.data[2], self.right.data[2]) | |
elif self.left and self.right==None: | |
self.data[2] = max(self.data[2], self.left.data[2]) | |
if i: | |
return self.left.data[2] | |
elif self.right and self.left == None: | |
self.data[2] = max(self.data[2], self.right.data[2]) | |
if i: | |
return self.right.data[2] | |
'''if self.parent!=None: | |
self.parent.findMax() | |
if self.parent==None: | |
if self.left: self.left.findMax() | |
if self.right: self.right.findMax()''' | |
def balance(self): | |
lheight = 0 | |
if self.left: | |
lheight = self.left.height() | |
rheight = 0 | |
if self.right: | |
rheight = self.right.height() | |
return lheight - rheight | |
def height(self): | |
lheight = 0 | |
if self.left: | |
lheight = self.left.height() | |
rheight = 0 | |
if self.right: | |
rheight = self.right.height() | |
return 1 + max(lheight, rheight) | |
def rotate_left(self): | |
self.data, self.right.data = self.right.data, self.data | |
old_left = self.left | |
self.set_childs(self.right, self.right.right) | |
self.left.set_childs(old_left, self.left.left) | |
self.left.data[2]=self.left.findMax(1) | |
def rotate_right(self): | |
self.data, self.left.data = self.left.data, self.data | |
old_right = self.right | |
self.set_childs(self.left.left, self.left) | |
self.right.set_childs(self.right.right, old_right) | |
if self.parent != None: | |
self.right.data[2]=self.right.findMax(1) | |
def rotate_left_right(self): | |
self.left.rotate_left() | |
self.rotate_right() | |
def rotate_right_left(self): | |
self.right.rotate_right() | |
self.rotate_left() | |
def do_balance(self): | |
#print "balancing",self.data | |
bal = self.balance() | |
if bal > 1: | |
if self.left.balance() > 0: | |
self.rotate_right() | |
else: | |
self.rotate_left_right() | |
elif bal < -1: | |
if self.right.balance() < 0: | |
self.rotate_left() | |
else: | |
self.rotate_right_left() | |
def insert(self, data): | |
if data <= self.data: | |
if not self.left: | |
self.left = Node(data) | |
self.left.parent=self | |
else: | |
self.left.insert(data) | |
else: | |
if not self.right: | |
self.right = Node(data) | |
self.right.parent=self | |
else: | |
self.right.insert(data) | |
self.do_balance() | |
self.findMax(None) | |
def print_tree(self, indent = 0): | |
print " " * indent + str(self.data) | |
#if self.parent: print self.parent | |
if self.left: | |
self.left.print_tree(indent + 2) | |
if self.right: | |
self.right.print_tree(indent + 2) | |
def query(self,q,counter): | |
global found | |
if q > self.data[2]: | |
return | |
if self.left != None: | |
self.left.query(q,counter) | |
if q>=self.data[0] and q<=self.data[1]: | |
found+=1 | |
if q <= self.data[0]: | |
return | |
if self.right != None: | |
self.right.query(q,counter) | |
if __name__ == "__main__": | |
input=os.getcwd()+'/input' | |
file=open('input') | |
n=file.readline().split()[2] #get n | |
triplets=[] | |
triplets=file.readline().split() | |
triplets.append(triplets[1]) #3rd element in triplets is right maxima range | |
tree = Node(map(lambda x: int(x), triplets)) #initialize the class | |
for line in range(1,int(n)): | |
triplets=file.readline().split() | |
triplets.append(triplets[1]) | |
tree.insert(map(lambda x: int(x), triplets)) | |
a=file.readline() #for the space | |
m=file.readline().split()[2] | |
q=[] #queries | |
for line in range(0,int(m)): | |
q.append(int(file.readline())) | |
tree.print_tree() | |
for i in q: | |
tree.query(i,0) | |
print found | |
found=0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'''similar.py by Samrat Jeyaprakash - better to read from __main__ first | |
Takes a list of twitter users and find similarities amongst them. Just a fun little script using a library from the book "Collective Intelligence" | |
I haven't the time to work on it - but this is a good start if you are looking for a fun way to do similarity vectors. | |
This implementation uses Tweepy, Redis & Pearson Correlation Similarity libraries from Collective Intelligence book | |
To run, run redis on default port | |
Iteration 1 - Uses two filters (pearson similarity coefficient & "following" users intersection) to find similarity between two users | |
TODO for Iteration 2 - add clustering and change feed_data to .json for Redis caching. Also implement finding screen_name and add redis.ZADD | |
Iteration 3 - find independent features a la central themes using non-negative matrix factorization and some bayesian classification | |
Iteration 4 - implement term-frequency/inverse document frequency for PageRank, and threads - actually use Twisted for asynchronous calls | |
Iteration 5 - use neural networks for better link tracking | |
''' | |
from recommendations import sim_pearson, sim_distance, topMatches | |
import feedparser, re, urllib, threading, redis,tweepy,json | |
def getUserMessages(user): | |
#get rss feeds of user to extract keywords from tweets | |
url = "http://twitter.com/statuses/user_timeline/" + user + ".rss?count=200" #change to .json in Iter#2 | |
feed_data = feedparser.parse(url) | |
#print feed_data | |
return feed_data.get("entries", []) | |
def getKeywordScores(user, messages): | |
#this method counts the words from the users' tweets and their frequency if > 1 | |
keywords = {} | |
stopwords = ["a", "an", "by", "on", "that", "the", "these", "this", "those", "to"] | |
# and many more words... for iteration 2, consider using zipf's laws' blacklisted words | |
#consider using http://www.webconfs.com/stop-words.php - also need to consider bad language | |
stopwords.append(user) | |
for message in messages: | |
tweet = message["summary"] | |
words = re.split(" ", tweet) | |
for word in words: #stopwords filtering loop | |
word = re.sub("^\W*", "", word) | |
word = re.sub("\W*$", "", word) | |
if word.startswith("http://"): | |
continue | |
word = word.lower() | |
if word in stopwords: | |
continue | |
if not word: | |
continue | |
count = keywords.get(word, 0) | |
keywords[word] = count + 1 | |
final_keywords = {} #takes only keywords > 1 appearance | |
for k in keywords: | |
if keywords[k] > 1: | |
final_keywords[k] = keywords[k] | |
return final_keywords | |
def processor(first_user,i): #thread helper method... don't use until iteration #4. threads for managing rss calls. | |
messages = getUserMessages(user = user) | |
user_keywords[user]=getKeywordScores(user,messages) | |
print sim_pearson(user_keywords, first_user, user) | |
return | |
def twitter_helper(screen_name=0): | |
auth = tweepy.BasicAuthHandler("YOUR_USERNAME", "YOUR_PASSWORD") | |
api = tweepy.API(auth) | |
print '---------------------------------------------' | |
for user in users: | |
print "Getting",user,"'s friends now" | |
friends=api.friends_ids(user) | |
if (screen_name == 1): #this is expensive to do using the current calls | |
following=[] | |
for friend in friends: | |
url = "http://twitter.com/statuses/user_timeline/" + str(friend) + ".json" | |
#print url,friend #debug helper - really need to use python debug logger | |
result=json.load(urllib.urlopen(url)) | |
#print result | |
following.append(result[0]['user']['screen_name']) | |
time.sleep(1) #did this to fool twitter, but still think there might be some sort of exponential backoff from twitter side | |
map(lambda m: r.sadd(user,m),following) # add screen_names to Redis key => user | |
else: | |
map(lambda m: r.sadd(user,m),friends) # add user ID's to Redis key => user | |
def user_intersection(users): | |
for i in range(len(users)-1): | |
print users[i],"'s common following with " | |
for j in range(i+1,len(users)): | |
print users[j], | |
print r.sinter([users[i],users[j]]),"\n" | |
print "___________________________________" | |
def pearsonScore(users): | |
#print users | |
user_keywords = {} | |
#threads=[] #maybe use it for thread tracking later... | |
s=[] | |
for user in users: | |
print "processing data for:", user | |
messages = getUserMessages(user = user) | |
'''t=threading.Thread(target=processor,args=(user[0],user)) #could use some threads for speeding up http calls tremendously | |
threads.append(t) | |
t.start() | |
print s''' | |
user_keywords[user] = getKeywordScores(user = user, messages = messages) | |
print "===========================================================================" | |
for user in users: | |
print "Top matches for ",user,":" | |
print topMatches(user_keywords, user,n=len(users),similarity = sim_pearson) | |
if __name__ == "__main__": | |
users=['aplusk','justinbieber','billgates','ladygaga','samratjp','ev'] | |
r = redis.Redis(host='localhost', port=6379, db=0) | |
pearsonScore(users) | |
'''twitter_helper() #this calls twitter to a user's "following - i.e. ppl they are following" IDs, | |
this kinda sucks, need to find screen_names, but haven't found one yet. so, expensive solution is to find twitter.com/../user_id.json then find screen_name | |
that sucks too, because twitter will blacklist you so fast! But, the intersection can be done with ids anyways. In any case, will implement user_name for Iter#2 | |
just need to find other ways. Even the RESTful methods didn't have any''' | |
twitter_helper(screen_name=0) #because of the blacklisting possibility, use 0 for now. 1 can get upto 24 results before twitter gets clever | |
user_intersection(users) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment