|from keras.models import Sequential|
|from keras.layers import Dense|
|from keras.utils.io_utils import HDF5Matrix|
|import numpy as np|
|X = np.random.randn(200,10).astype('float32')|
|y = np.random.randint(0, 2, size=(200,1))|
|f = h5py.File('test.h5', 'w')|
|# Creating dataset to store features|
|X_dset = f.create_dataset('my_data', (200,10), dtype='f')|
|X_dset[:] = X|
|# Creating dataset to store labels|
|y_dset = f.create_dataset('my_labels', (200,1), dtype='i')|
|y_dset[:] = y|
|# Instantiating HDF5Matrix for the training set, which is a slice of the first 150 elements|
|X_train = HDF5Matrix('test.h5', 'my_data', start=0, end=150)|
|y_train = HDF5Matrix('test.h5', 'my_labels', start=0, end=150)|
|# Likewise for the test set|
|X_test = HDF5Matrix('test.h5', 'my_data', start=150, end=200)|
|y_test = HDF5Matrix('test.h5', 'my_labels', start=150, end=200)|
|# HDF5Matrix behave more or less like Numpy matrices with regards to indexing|
|# But they do not support negative indices, so don't try print(X_train[-1])|
|model = Sequential()|
|model.add(Dense(64, input_shape=(10,), activation='relu'))|
|# Note: you have to use shuffle='batch' or False with HDF5Matrix|
|model.fit(X_train, y_train, batch_size=32, shuffle='batch')|
|model.evaluate(X_test, y_test, batch_size=32)|
Im not really understanding how to do this with images, especially because HDF5 matrix only works on matrixes of course - 2 dimensions. So for instance I have an hdf5 file that has 2 datasets, X and y. X is of shape (92072960, 1) and y is of shape (92072960, 112). X has been flattened to a long list of pixels with their respective values so that it can be stored as a matrix. Thus to feed the image into the CNN, I need to unflatten it.
Since each image of 224 *224 has 50176 pixels, I could do something like:...
do you see what I asking -- where to reshape arrays loaded from the hdf5 matrix so that they can be loaded into a CNN? And when to call model.fit
I am using keras with theano backend.
Traceback (most recent call last):
Hi, I just ran your code (example_hdf5matrix.py) and it does not work.
I get the following error trace:
Hi, I am using HDF5Matrix to load a dataset and train my model with it. Comparing to a numpy array with the same contents, training a keras model with the HDF5Matrix results in very slow learning. I mean, in the first epoch I get 10% accuracy when using the HDF5Matrix, but 40% accuracy when using the numpy array. I have posted in the keras forum for help as well, see the post for more details. Thank you
HDF5Matrix is much slower when I read data batches by batches, or use a for loop. Here is a quick modification:
It uses a generator, and basically split the large dataset that couldn't fit into memory as a whole, and split into 100 segments, and generate on each segment.
I think it should not be used with multiple workers.
And for my use case (I use the Sequence interface), I need to set Shuffle=False explicitly.
Thanks for the generator tip @Shawn-Shan. That meant I could actually fit my 200 GB data!
Note that I had to change