Skip to content

Instantly share code, notes, and snippets.

@Kappie
Created June 15, 2015 11:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Kappie/6f599132e02fef320537 to your computer and use it in GitHub Desktop.
Save Kappie/6f599132e02fef320537 to your computer and use it in GitHub Desktop.
require "nokogiri"
BASE_DIR = "../support_vector_machine/data/female_blogs"
TARGET_DIR = "../support_vector_machine/data/female_posts"
Dir["#{BASE_DIR}/*"].each do |path|
blog = Nokogiri::XML(File.open(path))
posts = blog.xpath("//post")
extension = File.extname(path)
basename = File.basename(path, extension)
posts.each_with_index do |post, index|
new_path = File.join(TARGET_DIR, basename + "_" + index.to_s + extension)
File.write(new_path, post.text)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment