Skip to content

Instantly share code, notes, and snippets.

@spara
Last active December 30, 2015 19:06
Show Gist options
  • Save spara/a6dc0fd3d3253f9e44be to your computer and use it in GitHub Desktop.
Save spara/a6dc0fd3d3253f9e44be to your computer and use it in GitHub Desktop.

Characterizing Conferences Based on Past Presentations

Process: Characterize conferences based on past presentations. The majority of conferences no longer require a paper and have relatively few artifacts with the exception of slides and videos. However, many conference sites maintain previous conference schedule that contain the session titles. Many sites provide the schedules in machine readable formats such as RSS, iCal, and HTML (to a lesser degree). In addition, Lanyrd may also provide the schedule in an iCal format if not available on the the conference site.

Parsing presentation titles

Example:

Parsing titles from iCal files

    from icalendar import Calendar, Event
    from datetime import datetime
    import sys
    
    o = open(sys.argv[2],'w')
    g = open(sys.argv[1],'rb')
    gcal = Calendar.from_ical(g.read())
    for component in gcal.walk():
    if component.name == "VEVENT":
                    o.write(component.get('summary').encode('utf8')+"\n")
    g.close()
    o.close()

Parsing titles from RSS

Example:

    import feedparser
    import time
    from subprocess import check_output
    import sys
    
    url = sys.argv[1]
    f = feedparser.parse(url)
    for i in f.entries:
        print i.title.encode(encoding='UTF-8',errors='strict')

Data was collected and stored in a tab separated value (tsv)

    conference name \t presentation titles

Finding keywords to characterize conferences

Use RAKE (Rapid Automatic Keyword Extraction) to find the most common terms in presentations for a conference.

RAKE tutorial

RAKE implementation on github: https://github.com/aneesha/RAKE

Example:

    import csv
    import rake
    import operator
    import sys
    
    rake_object = rake.Rake("SmartStoplist.txt", 3, 2, 2)
    with open(sys.argv[1],'r') as f:
        for line in f:
            data = line.split('\t')
            conf = data[0]
            text = data[1]
            text =text.replace('"',' ')
            keywords = rake_object.run(text.replace(',',' '))
            print conf,"\t",keywords

RAKE object that extracts keywords where:

    Each word has at least 3 characters
    Each phrase has at most 2 words
    Each keyword appears in the text at least 2 times

Output:

    OSCON   [('rebasing workflow', 4.0), ('akka  java', 4.0), ('bootcamp training', 4.0), ('open source', 3.8), ('reactive programming', 3.7142857142857144), ('scala   introduction', 3.125), ('programming', 1.7142857142857142), ('microservices', 1.5), ('community', 1.5), ('microservices \xe2\x80\x93', 1.5), ('data', 1.5), ('introduction', 1.375), ('openstack', 1.3333333333333333), ('make', 1.3333333333333333), ('fast', 1.3333333333333333), ('started', 1.3333333333333333), ('change', 1.25), ('future', 1.25), ('scale', 1.2), ('docker', 1.2), ('code', 1.0), ('back', 1.0), ('continued', 1.0), ('culture', 1.0), ('anti', 1.0), ('cassandra', 1.0), ('lessons', 1.0), ('story', 1.0), ('reality', 1.0), ('production', 1.0), ('swift', 1.0), ('internet', 1.0), ('run', 1.0), ('reilly', 1.0), ('boss', 1.0), ('sponsored', 1.0), ('hands', 1.0), ('github', 1.0), ('success', 1.0), ('presented', 1.0), ('flow', 1.0), ('leading', 1.0), ('relic', 1.0), ('making', 1.0)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment