Skip to content

Instantly share code, notes, and snippets.

@benaisc
Last active October 29, 2019 18:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benaisc/f03966796ab1bfc8916bf4c7de1f4415 to your computer and use it in GitHub Desktop.
Save benaisc/f03966796ab1bfc8916bf4c7de1f4415 to your computer and use it in GitHub Desktop.
create_DAGAN_database
#!/usr/bin/env python3
# coding: utf8
import numpy as np
from pathlib import Path
import matplotlib.image as mpimg
"""
Suppose a file tree i.e:
imgs/
|class1/
|images of type class1.png
|class2/
|...
|...
the images are normalized into numpy arrays in the form: (n_classes, n_samples, h, w, c)
"""
def create_db(images_dir_path):
if not Path(images_dir_path).is_dir():
print('Error')
exit()
dataset = []
for d in Path(images_dir_path).glob('*'):
# skipping simple files (readme, licences, ...)
if not d.is_dir():
continue
classData = []
for f in Path(d).glob('*.png'):
img = mpimg.imread(str(f))
img = img.astype(np.float)
img /= 255.0
img = np.reshape(img, newshape=(img.shape[0], img.shape[1], 3))
classData.append(img)
dataset.append(classData)
dataset.sort(key=len)
return np.array(dataset)
train_dir = '/path/to/imgs'
data = create_db(train_dir)
print("dataset shape:", data.shape)
np.save('my_database.npy', data)
@myagmur01
Copy link

hi gurujam ,
Thanks for the script
I am also preparing my dataset into 5-dim array. Please correct me if I am missing : you basically appending each image into 'classDataSet' and having 3-dim (h,w,c) . Then appending again into 'dataset' which is being 4-dim (num_example, h,w,c). isn't it supposed to be 5-dim?

@benaisc
Copy link
Author

benaisc commented Aug 3, 2018

Hi,
Talking about shapes implies working with array, and not lists.
Calculating the shape of an array result in the number of his elements plus their dimension (see doc
So, np.array(classDataSet).shape would give us a 4-dim array (num_samples, h, w, c).

@benaisc
Copy link
Author

benaisc commented Oct 29, 2019

Hi,
Reading this (a year later, seeing that people still scratch their brains with some DAGAN experiments), I feel like to continue my previous comment :
...
Following the same logic, data.shape will give you (num_classes, num samples, h, w, c)
[I also updated the script to make the things easier for you :)]
This suppose that all your images are png of the same shape.
Use np.reshape(img, newshape=(img.shape[0], img.shape[1], 1)) if your images are all black and white.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment