Skip to content

Instantly share code, notes, and snippets.

View rakshithvasudev's full-sized avatar

Rakshith Vasudev rakshithvasudev

  • Dell Technologies
  • Austin, TX
View GitHub Profile
"""
This script demonstrates an optimized pipeline.
This is not full code, this is merely a snippet.
1. Gets the absolute list of filenames. 
2. Builds a dataset from the list of filenames using from_tensor_slices() 
3. Sharding is done ahead of time. 
4. The dataset is shuffled during training. 
5. The dataset is then parallelly interleaved, which is basically interleaving and processing multiple files (defined by cycle_length) to transform them to create TFRecord dataset. 
6. The dataset is then prefetched. The buffer_size defines how many records are prefetched, which is usually the mini batch_size of the job. 
"""
This snippet demonstrates a non optimized tf data pipeline.
This is not full code, this is merely a snippet.
1. Gets the absolute list of filenames. 
2. Builds a dataset from the list of filenames using TFRecordDataset() 
3. Create a new dataset that loads and formats images by preprocessing them. 
4. Shard the dataset. 
5. Shuffle the dataset when training. 
6. Repeat the dataset. 
Hadoop Commands
# test code
cat testfile | ./mapper.py | sort | ./reducer.py
# run a job
hs mapper.py reducer.py input_folder output_folder
# view the results
hadoop fs -cat output_folder/part-00000 | less
@rakshithvasudev
rakshithvasudev / onehot-dataset
Created August 3, 2017 02:21
onehot-dataset.txt
╔════════════╦═════════════════╦════════╗
║ CompanyName Categoricalvalue ║ Price ║
╠════════════╬═════════════════╣════════║
║ VW ╬ 1 ║ 20000 ║
║ Acura ╬ 2 ║ 10011 ║
║ Honda ╬ 3 ║ 50000 ║
║ Honda ╬ 3 ║ 10000 ║
╚════════════╩═════════════════╩════════╝
@rakshithvasudev
rakshithvasudev / RecursiveBinarySearch.py
Created June 16, 2017 00:12
Recursive Binary Search - Python
def search(numbers, target, first, last):
mid = (first + last) // 2
if first > last:
return -1
elif target == numbers[mid]:
return mid
elif target < numbers[mid]:
return search(numbers, target, first, mid - 1)
else:
return search(numbers, target, mid + 1, last)