Skip to content

Instantly share code, notes, and snippets.

@udaylunawat
Created July 15, 2022 16:02
Show Gist options
  • Save udaylunawat/28ae3a3db11f9c446ecf30d12ee9f586 to your computer and use it in GitHub Desktop.
Save udaylunawat/28ae3a3db11f9c446ecf30d12ee9f586 to your computer and use it in GitHub Desktop.
Split folders with files (e.g. images) into train, validation and test (dataset) folders. And then convert them to Tensorflow Datasets.
# https://stackoverflow.com/a/64006242/9292995
# https://github.com/jfilter/split-folders
import splitfolders
# If your datasets is balanced (each class has the same number of samples), use ratio
# otherwise use fixed if dataset is imbalanced.
splitfolders.ratio('input_dir', output="output_dir", oversample=False, ratio = (0.8, 0.1, 0.1),
seed=1337)
# https://www.tensorflow.org/datasets/api_docs/python/tfds/folder_dataset/ImageFolder
builder = tfds.ImageFolder('data/4_tfds_dataset')
print(builder.info) # num examples, labels... are automatically calculated
data = builder.as_dataset(split=None, as_supervised=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment