Skip to content

Instantly share code, notes, and snippets.

@cpetersen
Created February 16, 2016 18:56
Show Gist options
  • Save cpetersen/ea4ed11b22faf3387436 to your computer and use it in GitHub Desktop.
Save cpetersen/ea4ed11b22faf3387436 to your computer and use it in GitHub Desktop.
Download all of pubchem
require 'net/ftp'
Net::FTP.open('ftp.ncbi.nlm.nih.gov') do |ftp|
ftp.passive = true
ftp.login
ftp.chdir('/pubchem/Compound/CURRENT-Full/SDF')
files = ftp.list('*')
total = 0
sdf_files = files.select { |f| f.match(/\.sdf\.gz$/) }
sdf_files.each_with_index do |file, index|
tokens = file.split(/\s+/)
size = tokens[4].to_i
total += size
filename = tokens.last
puts "Getting [#{filename}] [#{size}] [#{index} of #{sdf_files.count}]"
ftp.getbinaryfile(filename, filename, 1024)
end
puts "#{total} :: #{total.to_f/(1024*1024*1024).to_f}"
end; nil
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment