Skip to content

Instantly share code, notes, and snippets.

@valo
Created February 16, 2012 15:22
Show Gist options
  • Save valo/1845591 to your computer and use it in GitHub Desktop.
Save valo/1845591 to your computer and use it in GitHub Desktop.
require 'open-uri'
require 'thread'
# This is the index file with the filings
index_url = "http://www.sec.gov/Archives/edgar/daily-index/master.20110701.idx"
queue = Queue.new
system "mkdir -p 20110701"
Dir.chdir "20110701"
LINE_REGEX = /(\d+)\|([^\|]+)\|([^\|]+)\|(\d+)\|([^\|]+)/
threads = (0..4).to_a.map do
Thread.new do
while true do
cik, file_name = queue.pop
system "mkdir -p #{cik};cd #{cik} ; wget -cq http://www.sec.gov/Archives/#{file_name}"
end
end
end
open(index_url) do |index|
index.each_line do |line|
m = LINE_REGEX.match(line)
next unless m
_, cik, company_name,form_type,date_filed,file_name = m.to_a
queue.push([cik, file_name])
end
end
threads.each(&:join)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment