Skip to content

Instantly share code, notes, and snippets.

@kelciour
Created August 27, 2017 12:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kelciour/d281f2091c63ef39698896b2f549413b to your computer and use it in GitHub Desktop.
Save kelciour/d281f2091c63ef39698896b2f549413b to your computer and use it in GitHub Desktop.
# encoding: UTF-8
require 'pragmatic_segmenter'
directory_name = "output"
Dir.mkdir(directory_name) unless File.exists?(directory_name)
Dir.glob('*.txt') do |txt_file|
puts txt_file
text = File.open(txt_file, "r:UTF-8", &:read)
ps = PragmaticSegmenter::Segmenter.new(text: text, language: 'en', clean: false)
File.open(directory_name + '/' + txt_file, 'w') { |f| f.puts(ps.segment) }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment