Skip to content

Instantly share code, notes, and snippets.

View bonzanini's full-sized avatar

Marco Bonzanini bonzanini

View GitHub Profile
# Print most common words in a corpus collected from Twitter
#
# Full description:
# http://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/
# http://marcobonzanini.com/2015/03/09/mining-twitter-data-with-python-part-2/
# http://marcobonzanini.com/2015/03/17/mining-twitter-data-with-python-part-3-term-frequencies/
#
# Run:
# python twitter_most_common_words.py <filename.jsonl>
@bonzanini
bonzanini / create_recent_articles.sh
Created May 4, 2015 15:16
Create sample DB on Elasticsearch to showcase decay function over publication date
curl -XDELETE http://localhost:9200/blog
curl -XPUT http://localhost:9200/blog -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
},
"mapping": {
@bonzanini
bonzanini / stem_lemma_pos_nltk_example.py
Created January 24, 2015 12:52
Example of stemming, lemmatisation and POS-tagging in NLTK
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatiser = WordNetLemmatizer()
print("Stem %s: %s" % ("going", stemmer.stem("going")))
print("Stem %s: %s" % ("gone", stemmer.stem("gone")))
print("Stem %s: %s" % ("goes", stemmer.stem("goes")))
@bonzanini
bonzanini / sentiment_classification.py
Last active October 30, 2020 23:58
Sentiment analysis with scikit-learn
# You need to install scikit-learn:
# sudo pip install scikit-learn
#
# Dataset: Polarity dataset v2.0
# http://www.cs.cornell.edu/people/pabo/movie-review-data/
#
# Full discussion:
# https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn
@bonzanini
bonzanini / run_luigi.py
Created October 24, 2015 14:56
Example of Luigi task pipeline
# run with a custom --n
# python run_luigi.py SquaredNumbers --local-scheduler --n 20
import luigi
class PrintNumbers(luigi.Task):
n = luigi.IntParameter(default=10)
def requires(self):
return []
@bonzanini
bonzanini / create_proximity.sh
Created February 8, 2015 17:57
Elasticsearch Proximity/Phrase Search
curl -XDELETE http://localhost:9200/test/articles
curl -XPUT http://localhost:9200/test/_mapping/articles -d '{
"properties": {
"content": {
"type": "string",
"position_offset_gap": 100
}
}
}'
@bonzanini
bonzanini / create_index.sh
Last active May 25, 2023 23:35
Elasticsearch/Python test
curl -XPOST http://localhost:9200/test/articles/1 -d '{
"content": "The quick brown fox"
}'
curl -XPOST http://localhost:9200/test/articles/2 -d '{
"content": "What does the fox say?"
}'
curl -XPOST http://localhost:9200/test/articles/3 -d '{
"content": "The quick brown fox jumped over the lazy dog"
}'
curl -XPOST http://localhost:9200/test/articles/4 -d '{
@bonzanini
bonzanini / search_biopython.py
Last active February 9, 2024 21:44
Searching PubMed with Biopython
# This code uses Biopython to retrieve lists of articles from pubmed
# you need to install Biopython first.
# If you use Anaconda:
# conda install biopython
# If you use pip/venv:
# pip install biopython
# Full discussion:
@bonzanini
bonzanini / config.py
Last active April 18, 2024 11:57
Twitter Stream Downloader
consumer_key = 'your-consumer-key'
consumer_secret = 'your-consumer-secret'
access_token = 'your-access-token'
access_secret = 'your-access-secret'