Skip to content

Instantly share code, notes, and snippets.

@iwiwi
Created April 8, 2013 12:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iwiwi/5336478 to your computer and use it in GitHub Desktop.
Save iwiwi/5336478 to your computer and use it in GitHub Desktop.
DBLP dump parser for generating multiple networks
#!/usr/bin/env ruby
#
# dblp.rb --- Parse dblp.xml to generate graphs
#
# Usage:
# dblp.rb author <dblp.xml --> co-author network
# dblp.rb key cite <dblp.xml --> citation network
# dblp.rb key author <dblp.xml --> paper-author network
# dblp.rb key venue <dblp.xml --> paper-venue network
#
$type = nil
$date = nil
$key = nil
$venues = []
$authors = []
$cites = []
def getit(s)
if s == "key"
return [$key]
elsif s == "venue"
return $venues
elsif s == "author"
return $authors
elsif s == "cite"
return $cites
else
throw "What is #{s}???"
end
end
def output
if ARGV.length == 1
as = getit(ARGV[0])
as.each_with_index do |a, i|
as[0, i].each do |b|
puts "#{$date}\t#{b}\t#{a}"
end
end
else
ls = getit(ARGV[0])
rs = getit(ARGV[1])
ls.each do |l|
rs.each do |r|
puts "#{$date}\t#{l}\t#{r}"
end
end
end
end
if $0 == __FILE__
while line = $stdin.gets
if /^<(\S*) mdate=\"([\d\-]*)\" key=\"([^\"]*)\"/ =~ line # Beginning of a different item
output
$type = $1
$date = $2
$key = $3
$venues = []
$authors = []
$cites = []
elsif /^<journal>(.*)<\/journal>$/ =~ line || /^<booktitle>(.*)<\/booktitle>$/ =~ line
$venues.push($1)
elsif /^<author>(.*)<\/author>$/ =~ line
# $authors.push($1.gsub(/\s/, "_"))
$authors.push($1)
elsif /^<cite>(.*)<\/cite>$/ =~ line && $1 != "..."
$cites.push($1)
end
end
output # For the last item
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment