spara/conference_keywords.md

## conference_keywords.md

      
    Raw
  

              conference_keywords.md
            
          
    Characterizing Conferences Based on Past Presentations

Process: Characterize conferences based on past presentations. The majority of conferences no longer require a paper and have relatively few artifacts with the exception of slides and videos. However, many conference sites maintain previous conference schedule that contain the session titles. Many sites provide the schedules in machine readable formats such as RSS, iCal, and HTML (to a lesser degree). In addition, Lanyrd may also provide the schedule in an iCal format if not available on the the conference site.
Parsing presentation titles

Example:
Parsing titles from iCal files
    from icalendar import Calendar, Event
    from datetime import datetime
    import sys
    
    o = open(sys.argv[2],'w')
    g = open(sys.argv[1],'rb')
    gcal = Calendar.from_ical(g.read())
    for component in gcal.walk():
    if component.name == "VEVENT":
                    o.write(component.get('summary').encode('utf8')+"\n")
    g.close()
    o.close()

Parsing titles from RSS
Example:
    import feedparser
    import time
    from subprocess import check_output
    import sys
    
    url = sys.argv[1]
    f = feedparser.parse(url)
    for i in f.entries:
        print i.title.encode(encoding='UTF-8',errors='strict')

Data was collected and stored in a tab separated value (tsv)
    conference name \t presentation titles

Finding keywords to characterize conferences

Use RAKE (Rapid Automatic Keyword Extraction) to find the most common terms in presentations for a conference.
RAKE tutorial
RAKE implementation on github: https://github.com/aneesha/RAKE
Example:
    import csv
    import rake
    import operator
    import sys
    
    rake_object = rake.Rake("SmartStoplist.txt", 3, 2, 2)
    with open(sys.argv[1],'r') as f:
        for line in f:
            data = line.split('\t')
            conf = data[0]
            text = data[1]
            text =text.replace('"',' ')
            keywords = rake_object.run(text.replace(',',' '))
            print conf,"\t",keywords

RAKE object that extracts keywords where:
    Each word has at least 3 characters
    Each phrase has at most 2 words
    Each keyword appears in the text at least 2 times

Output:
    OSCON   [('rebasing workflow', 4.0), ('akka  java', 4.0), ('bootcamp training', 4.0), ('open source', 3.8), ('reactive programming', 3.7142857142857144), ('scala   introduction', 3.125), ('programming', 1.7142857142857142), ('microservices', 1.5), ('community', 1.5), ('microservices \xe2\x80\x93', 1.5), ('data', 1.5), ('introduction', 1.375), ('openstack', 1.3333333333333333), ('make', 1.3333333333333333), ('fast', 1.3333333333333333), ('started', 1.3333333333333333), ('change', 1.25), ('future', 1.25), ('scale', 1.2), ('docker', 1.2), ('code', 1.0), ('back', 1.0), ('continued', 1.0), ('culture', 1.0), ('anti', 1.0), ('cassandra', 1.0), ('lessons', 1.0), ('story', 1.0), ('reality', 1.0), ('production', 1.0), ('swift', 1.0), ('internet', 1.0), ('run', 1.0), ('reilly', 1.0), ('boss', 1.0), ('sponsored', 1.0), ('hands', 1.0), ('github', 1.0), ('success', 1.0), ('presented', 1.0), ('flow', 1.0), ('leading', 1.0), ('relic', 1.0), ('making', 1.0)]