Skip to content

Instantly share code, notes, and snippets.

@Torxed
Last active September 26, 2019 06:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Torxed/8d12d9cd0f543b81140df8cf348cfa6f to your computer and use it in GitHub Desktop.
Save Torxed/8d12d9cd0f543b81140df8cf348cfa6f to your computer and use it in GitHub Desktop.
import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"
for filename in os.listdir(base_path):
full_path = os.path.join(base_path, filename)
print(full_path)
with open(full_path,'r') as f_read: # [2]
loaded_txt = f_read.readlines()
with open(full_path,'w') as f_write: # [2]
for line in loaded_txt:
if del_keystring not in line:
f_write.write(line) # [4]
@Torxed
Copy link
Author

Torxed commented Sep 24, 2019

To conserve memory, one could do:

import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"

for filename in os.listdir(base_path):
	full_path = os.path.join(base_path, filename)
	print(full_path)

	with open(full_path,'r') as f_read: # [2]
		with open(full_path+'.new', 'w') as f_write:
			for line in f_read:
				if not del_keystring in line:
					f_write.write(line)

This will instead use disk-space rather than RAM.

@Torxed
Copy link
Author

Torxed commented Sep 24, 2019

To avoid using both resources, one could use the file pointer to replace and "move up" lines by opening the file in w+ and jump around with f.seek() and f.tell())

Incomplete answer below because I'm way to tired, and it gets complicated really fast:

import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"

for filename in os.listdir(base_path):
	full_path = os.path.join(base_path, filename)
	print(full_path)

	with open(full_path,'w+') as fh:
		previous = [fh.tell(), -1]
		offset = 0
		for line in fh:
			previous = previous[0], len(line)
			if del_keystring in line:
				offset += len(line)
				...

My brain hurts, might finish this one tomorrow because I like this kind of stuff.
Figure out the offset, calculate for it and insert "the next line" in place where the previous line was that is to be discarded. and so on..

@OverLordGoldDragon
Copy link

You appear to have gone all out on optimizing this - am seeing some new commands here. Better post on SO as an alternative answer for visibility. -- Bookmarked - thanks for the effort

Copy link

ghost commented Sep 25, 2019

To conserve memory, one could do:

import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"

for filename in os.listdir(base_path):
	full_path = os.path.join(base_path, filename)
	print(full_path)

	with open(full_path,'r') as f_read: # [2]
		with open(full_path+'.new', 'w') as f_write:
			for line in f_read:
				if not del_keystring in line:
					f_write.write(line)

This will instead use disk-space rather than RAM.

This code is more readable, and useful because I can keep the original .dat files.
However, is it possible to have the same file extension? e.g) 1147_new.dat ? (Not like 1147.dat.new)

@Torxed
Copy link
Author

Torxed commented Sep 26, 2019

To conserve memory, one could do:

import os
base_path = "/Users/xxx/Data/"
del_keystring = "DT=(SINGLE SINGLE SINGLE)"

for filename in os.listdir(base_path):
	full_path = os.path.join(base_path, filename)
	print(full_path)

	with open(full_path,'r') as f_read: # [2]
		with open(full_path+'.new', 'w') as f_write:
			for line in f_read:
				if not del_keystring in line:
					f_write.write(line)

This will instead use disk-space rather than RAM.

This code is more readable, and useful because I can keep the original .dat files.
However, is it possible to have the same file extension? e.g) 1147_new.dat ? (Not like 1147.dat.new)

You cam. But because of operation optimizations, you can't overwrite while reading, so you'll need to open two separate files during the programs execution if you're using that code. Later on you can do:

import shutil
shutil.move(full_path+'.new', full_path)

To overwrite the old file. It's actually a pretty good approach since you can error check before moving/replacing the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment