Skip to content

Instantly share code, notes, and snippets.

@ershad
Last active August 29, 2015 13:57
Show Gist options
  • Save ershad/9855947 to your computer and use it in GitHub Desktop.
Save ershad/9855947 to your computer and use it in GitHub Desktop.
# encoding: utf-8
require 'xmlsimple'
require 'csv'
hash = XmlSimple.xml_in('m.xml')
start_date = DateTime.parse "2013-12-28"
end_data = Date.today
CSV.open("out.csv", "w") do |csv|
csv << ['revision_id', 'page_title', 'username', 'timestamp', 'bytes', 'last_revision_id']
hash['page'].each do |page|
page_title = page['title'].first
if first_revision = page['revision'].first
revision_id = first_revision['id'].first
username = first_revision['contributor'].first.values.first.first
timestamp = first_revision['timestamp'].first
last_revision = page['revision'].last
bytes = last_revision['text'].first['bytes']
last_revision_id = last_revision['id'].first
date = DateTime.parse timestamp
if date >= start_date && date <= end_data
if page_title.include?('താൾ:')
csv << [revision_id, page_title, username, timestamp, bytes, last_revision_id]
end
end
end
end
end
# Checked ID with filtered file using this little shell script
# cat out.csv| while read line; do id=`echo $line | cut -f1 -d','`; (cat ~/Downloads/out\ 5200.csv| cut -f1 -d',' | grep $id > /dev/null) && (echo $line >> data.csv);done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment