Skip to content

Instantly share code, notes, and snippets.

@G-Square
Forked from sandys/table_to_csv.rb
Last active October 15, 2015 01:30
Show Gist options
  • Save G-Square/c10092c3b142af56d243 to your computer and use it in GitHub Desktop.
Save G-Square/c10092c3b142af56d243 to your computer and use it in GitHub Desktop.
convert a html table to CSV using ruby
#Thanks to Sandeep Srinivasa
#Enhanced for MySQL and phpMyAdmin CSV imports
require 'rubygems'
require 'nokogiri'
require 'csv'
#check for input file
if (!ARGV[0].nil? && File.file?(ARGV[0]))
puts ARGV[0]
file = File.open(ARGV[0])
else
puts 'Need input file (html) as first argument; all other arguments are treated as column names'
puts 'usage: ruby_table_csv.rb fileinput.html NAME PHONE EMAIL DEPARTMENT'
puts 'output: fileinput.csv'
exit(1)
end
doc = Nokogiri::HTML(file)
csv = CSV.open(File.basename(ARGV[0], File.extname(ARGV[0])) +".csv", 'w',
{:col_sep => ",",:quote_char => '"', :force_quotes => true})
#Add columns names as firstline
csv << ARGV[1..-1]
doc.xpath('//table/tbody/tr').each do |row|
# only get row if contains @
#if row.text.strip.match(/@/)
tarray = []
row.xpath('td').each do |cell|
tarray << cell.text
#end
csv << tarray
end
end
csv.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment