Skip to content

Instantly share code, notes, and snippets.

@joshuap
Created February 21, 2012 19:08
Show Gist options
  • Save joshuap/1878213 to your computer and use it in GitHub Desktop.
Save joshuap/1878213 to your computer and use it in GitHub Desktop.
Systematically retrieve audio files from popular mp3 blogs
#!/usr/bin/env ruby
require 'nokogiri'
require 'open-uri'
class Mp3Crawler
def initialize(url)
@data = Nokogiri::XML(open(url))
end
def links
@data.xpath('//enclosure').select{|e| e.attributes['type'].value.strip == 'audio/mpeg' }.map do |e|
e.attributes['url'].value.strip
end
end
end
feeds = DATA.read.split("\n").reject(&:empty?)
links = feeds.reduce([]) do |links, feed|
puts "Crawling #{feed}"
links << Mp3Crawler.new(feed).links
links
end
links.flatten!
puts "Captured #{links.size} files. Starting download..."
soundcloud = ->(l) { l =~ /soundcloud\.com/ }
links.each do |link|
filename = URI.decode(File.basename(case link
when soundcloud then link[/\/(.*)\/download/n, 1]
else link
end))
filename.concat('.mp3') if File.extname(filename).empty?
path = File.join(ENV['HOME'], 'Music', 'Blogs', filename)
puts "Downloading #{link}"
`wget -cO '#{path}' #{link}`
end
__END__
http://feeds2.feedburner.com/aurgasm
http://feeds.feedburner.com/tinyways
http://boomboomchik.com/feed
http://www.audiodrums.com/feed/
http://www.qcmixtapes.com/?feed=rss2
http://bambooorchestra.blogspot.com/feeds/posts/default?alt=rss
http://www.undomondo.com/rss
http://www.mixtaperiot.com/feed/
http://www.blogotheque.net/feed/
http://feeds2.feedburner.com/bigstereo
http://palmsout.net/feed/
http://www.scissorkick.com/feed/
http://music.for-robots.com/?feed=rss2
http://soul-sides.com/feed/
http://bennloxo.com/feed/
http://3hive.com/feed/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment