Skip to content

Instantly share code, notes, and snippets.

@tgalery
tgalery / twiterwars2.py
Created August 12, 2012 17:09
Python spell-checker for twiter stream

This is a simple python program that streams tweets from 2 locations, London and Exeter, in our example, and compares which one has the greatest number of spelling mistakes.

1 – Set-up used:

*Ubuntu 11.04 Natty AMD64
*Python 2.7.3
*python re library
*python nltk 2.0 library and the required NumPy and PyYaml (For NLP tasks)
*python tweeterstream 1.1.1 library (For Tweeter Manipulation)

@tgalery
tgalery / nnmf_no_datatreatment.py
Created August 15, 2012 02:04
Non-Negative Matrix Factorisation solutions to topic extraction in python

These are two solutions for a topic extraction task. The sample data is loaded into a variable by the script. I’ve included running times for both solutions, so we could have precise information about the cost that each one takes, in addition to their results. According to (Pazienza et al. 2005)
, two trends on textual information can be identified: one based on linguistic and syntactical information, another based on statistical analysis of frequency patterns (which usually consider text as a bags-of-words). Whilst the first approach is a purely syntactic one, the second one aims to imcorporate information about syntatic categories into the analysis (hence a hybrid approach)

After presenting the solutions and briefly mentioning an alternative to it, I’ll move to a short theoretical discussion.

1 – Set-up used:

*Ubuntu 11.04 Natty AMD64

*Python 2.7.3

*python re library

*python nltk 2.0 library and the required NumPy and PyYaml (For NLP tas

@tgalery
tgalery / ontology_cleanup.md
Last active December 24, 2015 11:49
Operation Ontology Clean-Up

This doc keeps track of some of the changes made to the ontology. Transformations mapped so far:

Relationships removed

  • "expressed_by" (between books/movies/films and languages)
  • "location" (between timezones and places)
  • "practicioneer" (between religion/languages and people)

Relationships mapped:

@tgalery
tgalery / sample_frequency_table.txt
Created November 5, 2013 15:59
Result of building a frequency table for a node.
Printing Frequency Table for sponge bob => /m/07vqnc
{ u'adaptation': 1,
u'appears_in': 15,
u'certification': 3,
u'contributor': 16,
u'created': 1,
u'genre': 18,
u'notable': 1,
u'part': 20,
@tgalery
tgalery / sample_frequency_table_incoming_outgoing.txt
Created November 5, 2013 17:15
Sample Frequency table split by relationship direction. It also detects whether there are relationship types that appear in both directions.
Printing Frequency Table for sponge bob => /m/07vqnc
{ 'incoming': { u'appears_in': 15,
u'created': 1,
u'subject': 2,
u'type_rel': 5},
'outgoing': { u'adaptation': 1,
u'certification': 3,
u'contributor': 16,
u'genre': 18,
http://econsultancy.com/blog/9583-how-video-marketing-powers-seo
http://dbpedia.org/resource/Vayu -- 0.1
http://dbpedia.org/resource/Algorithm -- 0.35
http://dbpedia.org/resource/ComScore -- 0.67
http://dbpedia.org/resource/Vimeo -- 0.53
http://dbpedia.org/resource/Web_search_engine -- 0.89
http://dbpedia.org/resource/Facebook -- 0.1
http://dbpedia.org/resource/YouTube -- 0.79
http://dbpedia.org/resource/LinkedIn -- 0.1
http://dbpedia.org/resource/Twitter -- 0.1
url: http://www.fastcoexist.com/3020930/yahoo-says-that-killing-working-from-home-is-turning-out-perfectly
text: When Yahoo CEO Marissa Mayer banned her 12,000 employees from working from home in February, her all-hands-on-deck ultimatum ignited a national debate on the merits of cloudworking that still rages. Silicon Valley’s fair-haired wunderkind was alternately mocked and condemned by the likes of Maureen Dowd and Richard Branson, while pundits declared she’d made “a terrible mistake.” Some even wondered whether Mayer was trying to make them quit. Mayer was finally hounded into addressing the issue in April, acknowledging her critics' contention that “people are more productive when they're alone,” and then stressing “but they're more collaborative and innovative when they're together.” Eight months later, Yahoo insists Mayer was right. (And earlier this month, HP’s Meg Whitman followed suit.)The workplace has become a catalyst for energy and buzz.Despite predictions of “epic policy failure,” in the words
url: http://www.fastcoexist.com/3020930/yahoo-says-that-killing-working-from-home-is-turning-out-perfectly
text: When Yahoo CEO Marissa Mayer banned her 12,000 employees from working from home in February, her all-hands-on-deck ultimatum ignited a national debate on the merits of cloudworking that still rages. Silicon Valley’s fair-haired wunderkind was alternately mocked and condemned by the likes of Maureen Dowd and Richard Branson, while pundits declared she’d made “a terrible mistake.” Some even wondered whether Mayer was trying to make them quit. Mayer was finally hounded into addressing the issue in April, acknowledging her critics' contention that “people are more productive when they're alone,” and then stressing “but they're more collaborative and innovative when they're together.” Eight months later, Yahoo insists Mayer was right. (And earlier this month, HP’s Meg Whitman followed suit.) The workplace has become a catalyst for energy and buzz.Despite predictions of “epic policy failure,” in the word
@tgalery
tgalery / NMV.md
Last active August 29, 2015 14:01
Notes on extracting NVM topics

Preliminary notes on NVM transcript data

Intro:

Looking at the data from New Virgin Media, many conversions lack appropriate topics. This is due to a number of reasons, such as :

1. Calls are not answered, so we can't extract much.

Looking at the distribution of the sample handed in:

@tgalery
tgalery / NVM_extra_topics.txt
Last active August 29, 2015 14:02
NVM extra Topics
Extracting topics for /Users/Thiago/datasets/client_dumps/annotated_transcripts/UKArchive/conversation_feedback_533b97b949ba5_-1589507063_014491c1-53d0-136f-f689-836c922ac0f5.wav.txt
Initial text is
>> Jack Ruin
New topic extracted Contact_centre_(business)
New topic extracted Address_book
New topic extracted Dean_(religion)
New topic extracted Manager_(baseball)