Skip to content

Instantly share code, notes, and snippets.

@necojackarc
Created November 1, 2015 14:04
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save necojackarc/3e5603d344e963db84c6 to your computer and use it in GitHub Desktop.
Save necojackarc/3e5603d344e963db84c6 to your computer and use it in GitHub Desktop.
You can get batches of lines from text files.
class TextFilePager
DEFAULT_BATCH_SIZE = 1000
def initialize(file_path, skip_header: false, delete_line_break: false)
@file_path = file_path
@skip_header = skip_header
@delete_line_break = delete_line_break
end
def batch_line(batch_size: DEFAULT_BATCH_SIZE)
File.open(@file_path) do |file|
file.gets if skip_header?
loop do
line, lines = "", []
batch_size.times do
break if (line = file.gets).nil?
lines << (delete_line_break? ? line.chomp : line)
end
yield lines
break if line.nil?
end
end
end
def skip_header?
@skip_header
end
def delete_line_break?
@delete_line_break
end
end
@necojackarc
Copy link
Author

Usage

You can use this class as below:

sample_text = TextFilePager.new("sample.txt", delete_line_break: true)
sample_text.batch_line(batch_size: n) do |lines|
  # your code
end

Sample

Input

Input file is just a "Lorem ipsum" processed a little.
A line break is inserted into the end of each sentence.
Plus, a line number is prepended into each line.

01 Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
02 Aenean commodo ligula eget dolor.
03 Aenean massa.
04 Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
05 Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem.
06 Nulla consequat massa quis enim.
07 Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
08 In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo.
09 Nullam dictum felis eu pede mollis pretium.
10 Integer tincidunt.
11 Cras dapibus.
12 Vivamus elementum semper nisi.
13 Aenean vulputate eleifend tellus.
14 Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim.
15 Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus.

Processing detail

  1. Remove line numbers
  2. Combine three sentences into one line.

That it.

Code

def process(lines)
  lines.map do |line|
    _, *content = line.split
    content.join("\s")
  end.join("\s")
end

# Main
sample_text = TextFilePager.new("sample.txt", delete_line_break: true)
sample_text.batch_line(batch_size: 3) do |lines|
  puts process(lines)
end

Output

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa.
Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim.
Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium.
Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi.
Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus.

Cool.

@necojackarc
Copy link
Author

I've written a post related with this gist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment