Process: Characterize conferences based on past presentations. The majority of conferences no longer require a paper and have relatively few artifacts with the exception of slides and videos. However, many conference sites maintain previous conference schedule that contain the session titles. Many sites provide the schedules in machine readable formats such as RSS, iCal, and HTML (to a lesser degree). In addition, Lanyrd may also provide the schedule in an iCal format if not available on the the conference site.
Example:
Parsing titles from iCal files
from icalendar import Calendar, Event
from datetime import datetime
import sys
o = open(sys.argv[2],'w')
g = open(sys.argv[1],'rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
if component.name == "VEVENT":
o.write(component.get('summary').encode('utf8')+"\n")
g.close()
o.close()
Parsing titles from RSS
Example:
import feedparser
import time
from subprocess import check_output
import sys
url = sys.argv[1]
f = feedparser.parse(url)
for i in f.entries:
print i.title.encode(encoding='UTF-8',errors='strict')
Data was collected and stored in a tab separated value (tsv)
conference name \t presentation titles
Use RAKE (Rapid Automatic Keyword Extraction) to find the most common terms in presentations for a conference.
RAKE tutorial
RAKE implementation on github: https://github.com/aneesha/RAKE
Example:
import csv
import rake
import operator
import sys
rake_object = rake.Rake("SmartStoplist.txt", 3, 2, 2)
with open(sys.argv[1],'r') as f:
for line in f:
data = line.split('\t')
conf = data[0]
text = data[1]
text =text.replace('"',' ')
keywords = rake_object.run(text.replace(',',' '))
print conf,"\t",keywords
RAKE object that extracts keywords where:
Each word has at least 3 characters
Each phrase has at most 2 words
Each keyword appears in the text at least 2 times
Output:
OSCON [('rebasing workflow', 4.0), ('akka java', 4.0), ('bootcamp training', 4.0), ('open source', 3.8), ('reactive programming', 3.7142857142857144), ('scala introduction', 3.125), ('programming', 1.7142857142857142), ('microservices', 1.5), ('community', 1.5), ('microservices \xe2\x80\x93', 1.5), ('data', 1.5), ('introduction', 1.375), ('openstack', 1.3333333333333333), ('make', 1.3333333333333333), ('fast', 1.3333333333333333), ('started', 1.3333333333333333), ('change', 1.25), ('future', 1.25), ('scale', 1.2), ('docker', 1.2), ('code', 1.0), ('back', 1.0), ('continued', 1.0), ('culture', 1.0), ('anti', 1.0), ('cassandra', 1.0), ('lessons', 1.0), ('story', 1.0), ('reality', 1.0), ('production', 1.0), ('swift', 1.0), ('internet', 1.0), ('run', 1.0), ('reilly', 1.0), ('boss', 1.0), ('sponsored', 1.0), ('hands', 1.0), ('github', 1.0), ('success', 1.0), ('presented', 1.0), ('flow', 1.0), ('leading', 1.0), ('relic', 1.0), ('making', 1.0)]