Skip to content

Instantly share code, notes, and snippets.

@tgalery
tgalery / wcpo_urls.json
Created August 7, 2014 11:27
Wcpo urls
[
"http://www.wcpo.com/news/news-photo-gallery/neil-armstrong-memorial-service-in-dc",
"http://www.wcpo.com/entertainment/local-a-e/video-top-collegiate-art-students-win-showcase-at-cincinnati-gallery",
"http://www.wcpo.com/news/news-photo-gallery/cincinnati-entertainment-awards-17th-annual-cea-awards",
"http://www.wcpo.com/news/news-photo-gallery/agony-and-fun-of-a-winter-storm",
"http://www.wcpo.com/news/news-photo-gallery/throwbackthursday-our-top-9-photos-of-the-week",
"http://www.wcpo.com/news/news-photo-gallery/cincy-dances-in-washington-park",
"http://www.wcpo.com/web/wcpo/news/news-photo-gallery/man-jumps-from-deck-to-escape-cold-spring-fire",
"http://www.wcpo.com/weather/weather-photo-gallery/march-roars-with-ice-and-snow",
"http://www.wcpo.com/weather/weather-photo-gallery/heavy-rains-hit-tri-state",
@tgalery
tgalery / report_stem_based_topic_extraction.md
Last active August 29, 2015 14:03
This doc describes some of the issues we found when putting a stem based extractor on spotlight development

Issues with stem-based Spotlight

  1. Bare stemming might be a bit too coarse grained.
    1. Possessive pronouns and contractions reduce to the same form. For example, "[Shake its] head" and "Shake it" the song reduce "shak it" and we have topic interferences from one form to the other.
    2. Latin pluralities (borrowings) reduce like native pluralities, e.g. "Illuminati" reduces to "illumit" and we get the topic Ilumminated.
    3. Acronyms might reduce to blacklisted words, e.g. "IT" => "it".
    4. Verbal endings, e.g. "whisking up" => "whisk up" and we disambiguate it as whiskey.
    5. "According" => "accord" which is then disambiguated as Honda Accord.
  2. "The timing" => Time
@tgalery
tgalery / NVM_extra_topics.txt
Last active August 29, 2015 14:02
NVM extra Topics
Extracting topics for /Users/Thiago/datasets/client_dumps/annotated_transcripts/UKArchive/conversation_feedback_533b97b949ba5_-1589507063_014491c1-53d0-136f-f689-836c922ac0f5.wav.txt
Initial text is
>> Jack Ruin
New topic extracted Contact_centre_(business)
New topic extracted Address_book
New topic extracted Dean_(religion)
New topic extracted Manager_(baseball)
@tgalery
tgalery / NMV.md
Last active August 29, 2015 14:01
Notes on extracting NVM topics

Preliminary notes on NVM transcript data

Intro:

Looking at the data from New Virgin Media, many conversions lack appropriate topics. This is due to a number of reasons, such as :

1. Calls are not answered, so we can't extract much.

Looking at the distribution of the sample handed in:

url: http://www.fastcoexist.com/3020930/yahoo-says-that-killing-working-from-home-is-turning-out-perfectly
text: When Yahoo CEO Marissa Mayer banned her 12,000 employees from working from home in February, her all-hands-on-deck ultimatum ignited a national debate on the merits of cloudworking that still rages. Silicon Valley’s fair-haired wunderkind was alternately mocked and condemned by the likes of Maureen Dowd and Richard Branson, while pundits declared she’d made “a terrible mistake.” Some even wondered whether Mayer was trying to make them quit. Mayer was finally hounded into addressing the issue in April, acknowledging her critics' contention that “people are more productive when they're alone,” and then stressing “but they're more collaborative and innovative when they're together.” Eight months later, Yahoo insists Mayer was right. (And earlier this month, HP’s Meg Whitman followed suit.) The workplace has become a catalyst for energy and buzz.Despite predictions of “epic policy failure,” in the word
url: http://www.fastcoexist.com/3020930/yahoo-says-that-killing-working-from-home-is-turning-out-perfectly
text: When Yahoo CEO Marissa Mayer banned her 12,000 employees from working from home in February, her all-hands-on-deck ultimatum ignited a national debate on the merits of cloudworking that still rages. Silicon Valley’s fair-haired wunderkind was alternately mocked and condemned by the likes of Maureen Dowd and Richard Branson, while pundits declared she’d made “a terrible mistake.” Some even wondered whether Mayer was trying to make them quit. Mayer was finally hounded into addressing the issue in April, acknowledging her critics' contention that “people are more productive when they're alone,” and then stressing “but they're more collaborative and innovative when they're together.” Eight months later, Yahoo insists Mayer was right. (And earlier this month, HP’s Meg Whitman followed suit.)The workplace has become a catalyst for energy and buzz.Despite predictions of “epic policy failure,” in the words
http://econsultancy.com/blog/9583-how-video-marketing-powers-seo
http://dbpedia.org/resource/Vayu -- 0.1
http://dbpedia.org/resource/Algorithm -- 0.35
http://dbpedia.org/resource/ComScore -- 0.67
http://dbpedia.org/resource/Vimeo -- 0.53
http://dbpedia.org/resource/Web_search_engine -- 0.89
http://dbpedia.org/resource/Facebook -- 0.1
http://dbpedia.org/resource/YouTube -- 0.79
http://dbpedia.org/resource/LinkedIn -- 0.1
http://dbpedia.org/resource/Twitter -- 0.1
@tgalery
tgalery / sample_frequency_table_incoming_outgoing.txt
Created November 5, 2013 17:15
Sample Frequency table split by relationship direction. It also detects whether there are relationship types that appear in both directions.
Printing Frequency Table for sponge bob => /m/07vqnc
{ 'incoming': { u'appears_in': 15,
u'created': 1,
u'subject': 2,
u'type_rel': 5},
'outgoing': { u'adaptation': 1,
u'certification': 3,
u'contributor': 16,
u'genre': 18,
@tgalery
tgalery / sample_frequency_table.txt
Created November 5, 2013 15:59
Result of building a frequency table for a node.
Printing Frequency Table for sponge bob => /m/07vqnc
{ u'adaptation': 1,
u'appears_in': 15,
u'certification': 3,
u'contributor': 16,
u'created': 1,
u'genre': 18,
u'notable': 1,
u'part': 20,
@tgalery
tgalery / ontology_cleanup.md
Last active December 24, 2015 11:49
Operation Ontology Clean-Up

This doc keeps track of some of the changes made to the ontology. Transformations mapped so far:

Relationships removed

  • "expressed_by" (between books/movies/films and languages)
  • "location" (between timezones and places)
  • "practicioneer" (between religion/languages and people)

Relationships mapped: