Created
June 21, 2014 15:30
-
-
Save NiranjanSarade/b43fa6417e1d771699b6 to your computer and use it in GitHub Desktop.
Read large csv file (in GBs) in chunks of memory - replace old text with new text and create new file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
class File | |
def self.sequential_read(file_path,chunk_size=nil) | |
open(file_path) do |f| | |
f.each_chunk(chunk_size) do |chunk| | |
yield chunk | |
end | |
end | |
end | |
def each_chunk(chunk_size=1024) | |
yield read(chunk_size) until eof? | |
end | |
end | |
oldText = ARGV[1] | |
newText = ARGV[2] | |
CHUNK_SIZE = 10240000 # 10MB | |
p "ARGV[1] - #{oldText}" | |
p "ARGV[2] - #{newText}" | |
Dir["#{ARGV[0]}/*.csv"].each do |filename| | |
p "Reading file - #{filename}" | |
replaced_file = filename.gsub(/.csv/,'_replaced.csv') | |
f = File.open(replaced_file,'w') | |
File.sequential_read(filename,CHUNK_SIZE) do |chunk| | |
chunk.each_line do |line| | |
line.gsub!(oldText,newText) | |
f.write line | |
end | |
end | |
f.close | |
p "Replaced file = #{replaced_file}" | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment