Skip to content

Instantly share code, notes, and snippets.

@howardjones
Last active February 12, 2021 08:45
Show Gist options
  • Save howardjones/bbbd949df9ea2797a2ca5f886a6fa66c to your computer and use it in GitHub Desktop.
Save howardjones/bbbd949df9ea2797a2ca5f886a6fa66c to your computer and use it in GitHub Desktop.
Python Directory iterator

DirectoryIterator

simple iterator for walking directory trees, optionally returning only matching files

e.g.

def is_json(filename):
    return filename.endswith(".json")

starts = ['X:\\configs', 'X:\\todo']
d = DirectoryIterator(starts, is_json)
for jfile in d:
    with open(jfile) as f:
        data = json.load(f)
        # [do something useful here]

I found myself doing this enough to want to hide the detail, and also avoid building big lists in memory.

First arg is a list of top directories or a single string. If you manage to put a file in there, it is just returned in the list of results. Otherwise, each iteration returns the next matching file in the tree. If you don't supply a second argument, you get all files. The second argument is a Runnable that takes the full pathname and returns True or False.

class DirectoryIterator:
roots = []
test = None
def __init__(self, roots, test=None):
if isinstance(roots, str):
self.roots = [roots]
else:
self.roots = roots
self.test = test
def __iter__(self):
for start_path in self.roots:
if os.path.isfile(start_path):
yield start_path
else:
for entry in os.walk(start_path):
dirpath, dirnames, filenames = entry
for filename in filenames:
if self.test is None or self.test(filename):
yield os.path.join(dirpath, filename)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment