-
-
Save Torxed/8d12d9cd0f543b81140df8cf348cfa6f to your computer and use it in GitHub Desktop.
import os | |
base_path = "/Users/xxx/Data/" | |
del_keystring = "DT=(SINGLE SINGLE SINGLE)" | |
for filename in os.listdir(base_path): | |
full_path = os.path.join(base_path, filename) | |
print(full_path) | |
with open(full_path,'r') as f_read: # [2] | |
loaded_txt = f_read.readlines() | |
with open(full_path,'w') as f_write: # [2] | |
for line in loaded_txt: | |
if del_keystring not in line: | |
f_write.write(line) # [4] |
To avoid using both resources, one could use the file pointer to replace and "move up" lines by opening the file in w+
and jump around with f.seek()
and f.tell()
)
Incomplete answer below because I'm way to tired, and it gets complicated really fast:
import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"
for filename in os.listdir(base_path):
full_path = os.path.join(base_path, filename)
print(full_path)
with open(full_path,'w+') as fh:
previous = [fh.tell(), -1]
offset = 0
for line in fh:
previous = previous[0], len(line)
if del_keystring in line:
offset += len(line)
...
My brain hurts, might finish this one tomorrow because I like this kind of stuff.
Figure out the offset, calculate for it and insert "the next line" in place where the previous line was that is to be discarded. and so on..
You appear to have gone all out on optimizing this - am seeing some new commands here. Better post on SO as an alternative answer for visibility. -- Bookmarked - thanks for the effort
To conserve memory, one could do:
import os base_path = "/Users/xxx/Data/" del_keystring = "DT=(SINGLE SINGLE SINGLE)" for filename in os.listdir(base_path): full_path = os.path.join(base_path, filename) print(full_path) with open(full_path,'r') as f_read: # [2] with open(full_path+'.new', 'w') as f_write: for line in f_read: if not del_keystring in line: f_write.write(line)This will instead use disk-space rather than RAM.
This code is more readable, and useful because I can keep the original .dat files.
However, is it possible to have the same file extension? e.g) 1147_new.dat ? (Not like 1147.dat.new)
To conserve memory, one could do:
import os base_path = "/Users/xxx/Data/" del_keystring = "DT=(SINGLE SINGLE SINGLE)" for filename in os.listdir(base_path): full_path = os.path.join(base_path, filename) print(full_path) with open(full_path,'r') as f_read: # [2] with open(full_path+'.new', 'w') as f_write: for line in f_read: if not del_keystring in line: f_write.write(line)This will instead use disk-space rather than RAM.
This code is more readable, and useful because I can keep the original .dat files.
However, is it possible to have the same file extension? e.g) 1147_new.dat ? (Not like 1147.dat.new)
You cam. But because of operation optimizations, you can't overwrite while reading, so you'll need to open two separate files during the programs execution if you're using that code. Later on you can do:
import shutil
shutil.move(full_path+'.new', full_path)
To overwrite the old file. It's actually a pretty good approach since you can error check before moving/replacing the file.
To conserve memory, one could do:
This will instead use disk-space rather than RAM.