Skip to content

Instantly share code, notes, and snippets.

@NiranjanSarade
Created June 21, 2014 15:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NiranjanSarade/b43fa6417e1d771699b6 to your computer and use it in GitHub Desktop.
Save NiranjanSarade/b43fa6417e1d771699b6 to your computer and use it in GitHub Desktop.
Read large csv file (in GBs) in chunks of memory - replace old text with new text and create new file.
require 'rubygems'
class File
def self.sequential_read(file_path,chunk_size=nil)
open(file_path) do |f|
f.each_chunk(chunk_size) do |chunk|
yield chunk
end
end
end
def each_chunk(chunk_size=1024)
yield read(chunk_size) until eof?
end
end
oldText = ARGV[1]
newText = ARGV[2]
CHUNK_SIZE = 10240000 # 10MB
p "ARGV[1] - #{oldText}"
p "ARGV[2] - #{newText}"
Dir["#{ARGV[0]}/*.csv"].each do |filename|
p "Reading file - #{filename}"
replaced_file = filename.gsub(/.csv/,'_replaced.csv')
f = File.open(replaced_file,'w')
File.sequential_read(filename,CHUNK_SIZE) do |chunk|
chunk.each_line do |line|
line.gsub!(oldText,newText)
f.write line
end
end
f.close
p "Replaced file = #{replaced_file}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment