Skip to content

Instantly share code, notes, and snippets.

@danlester
Created June 15, 2018 10:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danlester/f7cdbc741cf06317ff3870c52bb3474a to your computer and use it in GitHub Desktop.
Save danlester/f7cdbc741cf06317ff3870c52bb3474a to your computer and use it in GitHub Desktop.
Tensorflow Datasets API for processing CSV string column into int tensor
CSV_COLUMNS = ['p{}'.format(i) for i in range(30)] + ['Image']
CSV_COLUMN_DEFAULTS = [[0.0]]*30 + [['']]
def parse_csv(rows_string_tensor):
columns = tf.decode_csv(rows_string_tensor, record_defaults=CSV_COLUMN_DEFAULTS)
raw_features = dict(zip(CSV_COLUMNS, columns))
image_str_array_sparse = tf.string_split([raw_features['Image']])
image_str_array = tf.sparse_to_dense(image_str_array_sparse.indices, image_str_array_sparse.dense_shape, image_str_array_sparse.values, '')
image_str_array = image_str_array[0]
raw_features['Image'] = tf.string_to_number(image_str_array)
return raw_features
def datasetload(csvfile, test=False):
return tf.data.TextLineDataset([csvfile]).skip(1).map(parse_csv)
features_and_labels = datasetload('training.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment