Skip to content

Instantly share code, notes, and snippets.

@mikewcasale
Created March 26, 2020 05:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikewcasale/855ad173d221f90237abe3affa57ea1f to your computer and use it in GitHub Desktop.
Save mikewcasale/855ad173d221f90237abe3affa57ea1f to your computer and use it in GitHub Desktop.
glob dir
import glob
import numpy as np
files = glob.glob('./codedata/*/*', recursive=True)
# Split files into test/train set
np.random.seed(1000) # For reproducability
np.random.shuffle(files)
N = int(float(len(files))*0.8) # Do an 80-20 split for training/validation
data = dict(
train=files[:N],
valid=files[N-len(files):],
)
num_nq_examples = dict(train=N, valid=len(files)-N)
print(num_nq_examples)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment