Skip to content

Instantly share code, notes, and snippets.

@tarxvf
Last active June 9, 2016 15:03
Show Gist options
  • Save tarxvf/041f3b9553ca88de5a047b6ea7a4ed8e to your computer and use it in GitHub Desktop.
Save tarxvf/041f3b9553ca88de5a047b6ea7a4ed8e to your computer and use it in GitHub Desktop.
takes a directory of files (in this case PDF targets, and creates a distribute list of N processing files (to push out to a poor man's cluster using something like <script src="https://gist.github.com/tarxvf/ba49560220e69a90d1df379a3f309150.js"></script>
import os
rootdir = '/tmp/'
import re
count=0
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print file
m = re.search("[0-9]+\.pdf$",file)
if hasattr(m, 'group'):
if m.group(0):
#print os.path.join(subdir, file)
count=count+1
file_name="/tmp/process_group_{}.txt".format(int(count/501))
print "filename={0} count={1}".format(file_name,count)
with open(file_name, "a") as text_file:
text_file.write("{0}\n".format(file))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment