Peter Gagarinov pgagarinov

## npuint8_torchfloat32.py
""" I was writing a dataloader from a video stream. I ran some numbers.
# in a nutshell.
-> np.transpose() or torch.permute() is faster as uint8, no difference between torch and numpy
-> np.uint8/number results in np.float64, never do it, if anything cast as np.float32
-> convert to pytorch before converting uint8 to float32
-> contiguous() is is faster in torch than numpy
-> contiguous() is faster for torch.float32 than for torch.uint8
-> convert to CUDA in the numpy to pytorch conversion, if you can.
-> in CPU tensor/my_float is > 130% more costly than tensor.div_(myfloat), however tensor.div_()
does not keep track of gradients, so be careful using it.

## sysctl-proxmox-tune.conf
###
# Proxmox or other server kernel params cheap tune and secure.
# Try it if you have heavy load on server - network or memory / disk.
# No harm assumed but keep your eyes open.
#
# @updated: 2020-02-06 - more params used, adjust some params values, more comments on params
#

### NETWORK ###
	""" I was writing a dataloader from a video stream. I ran some numbers.
	# in a nutshell.
	-> np.transpose() or torch.permute() is faster as uint8, no difference between torch and numpy
	-> np.uint8/number results in np.float64, never do it, if anything cast as np.float32
	-> convert to pytorch before converting uint8 to float32
	-> contiguous() is is faster in torch than numpy
	-> contiguous() is faster for torch.float32 than for torch.uint8
	-> convert to CUDA in the numpy to pytorch conversion, if you can.
	-> in CPU tensor/my_float is > 130% more costly than tensor.div_(myfloat), however tensor.div_()
	does not keep track of gradients, so be careful using it.
	###
	# Proxmox or other server kernel params cheap tune and secure.
	# Try it if you have heavy load on server - network or memory / disk.
	# No harm assumed but keep your eyes open.
	#
	# @updated: 2020-02-06 - more params used, adjust some params values, more comments on params
	#

	### NETWORK ###