xvdp

## npuint8_torchfloat32.py
""" I was writing a dataloader from a video stream. I ran some numbers.
# in a nutshell.
-> np.transpose() or torch.permute() is faster as uint8, no difference between torch and numpy
-> np.uint8/number results in np.float64, never do it, if anything cast as np.float32
-> convert to pytorch before converting uint8 to float32
-> contiguous() is is faster in torch than numpy
-> contiguous() is faster for torch.float32 than for torch.uint8
-> convert to CUDA in the numpy to pytorch conversion, if you can.
-> in CPU tensor/my_float is > 130% more costly than tensor.div_(myfloat), however tensor.div_()
does not keep track of gradients, so be careful using it.

## memory_tests.py
"""testing vram in pytorch cuda
every time a variable is put inside a container in python, to remove it completely
one needs to delete variable and container,
this can be problematic when using pytorch cuda if one doesnt clear all containers
Three tests:

>>> python memory_tests list
    # creates 2 tensors puts them in a list, modifies them in place, deletes them
    # in place mod changes original tensors
    # list and both tensors need to be deleted
	""" I was writing a dataloader from a video stream. I ran some numbers.
	# in a nutshell.
	-> np.transpose() or torch.permute() is faster as uint8, no difference between torch and numpy
	-> np.uint8/number results in np.float64, never do it, if anything cast as np.float32
	-> convert to pytorch before converting uint8 to float32
	-> contiguous() is is faster in torch than numpy
	-> contiguous() is faster for torch.float32 than for torch.uint8
	-> convert to CUDA in the numpy to pytorch conversion, if you can.
	-> in CPU tensor/my_float is > 130% more costly than tensor.div_(myfloat), however tensor.div_()
	does not keep track of gradients, so be careful using it.
	"""testing vram in pytorch cuda
	every time a variable is put inside a container in python, to remove it completely
	one needs to delete variable and container,
	this can be problematic when using pytorch cuda if one doesnt clear all containers
	Three tests:

	>>> python memory_tests list
	# creates 2 tensors puts them in a list, modifies them in place, deletes them
	# in place mod changes original tensors
	# list and both tensors need to be deleted