Skip to content

Instantly share code, notes, and snippets.

Emaad Ahmed Manzoor emaadmanzoor

Block or report user

Report or block emaadmanzoor

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View Word embeddings via PMI-matrix factorization.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
emaadmanzoor /
Last active Feb 16, 2018
95865 Model Evaluation Demo
#!/usr/bin/env python
# Copyright 2016 Emaad Ahmed Manzoor
# License: Apache License, Version 2.0
Get Spark Streaming microbatch statistics:
- Batch start time
- Scheduling delay (in seconds) for each microbatch

StreamSpot Bootstrap Clusters

Below are the bootstrap clusters used for the experiments in the StreamSpot paper for each of following datasets:

  • all (01-C50_k10_all.txt): Chunk length of 50, 10 clusters.
  • ydc (02-C25_k5_ydc.txt): Chunk length of 25, 5 clusters.
  • gfc (03-C50_k5_gfc.txt): Chunk length of 50, 5 clusters.
emaadmanzoor /
Last active Aug 29, 2015
Quantifying Monotony Aversion

See the project website for more details.

Please report any issues to


Running this requires having the following files in the same directory as

  • all_links.p
  • all_tweets.p
emaadmanzoor /
Last active Aug 29, 2015
Attention Potential Validation Code

See the project website for more details.

Please report any issues to

Correlation Results

The attention potential (as estimated in section 4), when evaluated on this Twitter dataset:

  • Is 73.61% correlated with the retweets obtained.
  • Is significantly correlated (p < 0.05).
emaadmanzoor /
Created Sep 9, 2013
Frievald's Algorithm
import random
import operator
t = int(raw_input())
randint = random.randint
def deterministic(a,b,c,n):
no = 0
for p in xrange(n):
for q in xrange(n):
emaadmanzoor /
Last active Aug 16, 2019
Expand the Edinburgh Twitter FSD corpus

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

  • Respect the Twitter API rate limits and throttle API hits.
  • Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
  • Parse the API response and dump it into the correct column in the sqlite3 database.
  • Gracefully handle exceptions while acquiring tweets from the API.
  • Wrap version 1.1 of the Twitter API.
  • Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.
You can’t perform that action at this time.