Fern tysam-code

## hlb-cifar10-ternary-train-initial-working-prototype.py
# Note: The one change we need to make if we're in Colab is to uncomment this below block.
# If we are in an ipython session or a notebook, clear the state to avoid bugs
"""
try:
  _ = get_ipython().__class__.__name__
  ## we set -f below to avoid prompting the user before clearing the notebook state
  %reset -f
except NameError:
  pass ## we're still good
"""

## discrete-action-backprop-bypass-hlb-cifar10-demo.py
# Sketch-specific note: a roughly ~25 run battery for this code estimated a roughly ~93.11% accuracy in the same number of steps as the baseline network, ~1.7x runtime overhead (much of which goes to the torch.randn allocations and extra layer calculations).

# Note: The one change we need to make if we're in Colab is to uncomment this below block.
# If we are in an ipython session or a notebook, clear the state to avoid bugs
#"""
try:
  _ = get_ipython().__class__.__name__
  ## we set -f below to avoid prompting the user before clearing the notebook state
  %reset -f
except NameError:

## in_dev_slicer_prefetcher.py
import os
import time
import queue
import threading
import random

import torch
import torchvision
from torchvision.transforms import v2

## condensed-ml-tidbits.txt
# [IN-DEV currently]

# Maintained/Initially created by Fern. Say hi to me and feel free to ask any questions as needed! <3 :'))))
# If anything here is self-cited/has no citation, that means that it's a conclusion I arrived at over time, or in
#     deriving something from the basics, however, there may be work elaborating it in further detail (feel free to comment if there's an especially relevant link).

# Misc
- LayerNorm/RMSNorm might be acting as lateral inhibition, a paradigm attempted in many 2000's and surrounding ML papers (Fern, {relevant sources needed})
- 'Soft' (pre-determined or pre-compiled) architectures in the weights of your network can greatly increase convergences times and/or generalization.
- Downcasting dtypes to a lower bit depth in your dot products can be a 'free' efficiency improvement in some circumstances.
	# Note: The one change we need to make if we're in Colab is to uncomment this below block.
	# If we are in an ipython session or a notebook, clear the state to avoid bugs
	"""
	try:
	_ = get_ipython().__class__.__name__
	## we set -f below to avoid prompting the user before clearing the notebook state
	%reset -f
	except NameError:
	pass ## we're still good
	"""
	# Sketch-specific note: a roughly ~25 run battery for this code estimated a roughly ~93.11% accuracy in the same number of steps as the baseline network, ~1.7x runtime overhead (much of which goes to the torch.randn allocations and extra layer calculations).

	# Note: The one change we need to make if we're in Colab is to uncomment this below block.
	# If we are in an ipython session or a notebook, clear the state to avoid bugs
	#"""
	try:
	_ = get_ipython().__class__.__name__
	## we set -f below to avoid prompting the user before clearing the notebook state
	%reset -f
	except NameError:
	import os
	import time
	import queue
	import threading
	import random

	import torch
	import torchvision
	from torchvision.transforms import v2
	# [IN-DEV currently]

	# Maintained/Initially created by Fern. Say hi to me and feel free to ask any questions as needed! <3 :'))))
	# If anything here is self-cited/has no citation, that means that it's a conclusion I arrived at over time, or in
	# deriving something from the basics, however, there may be work elaborating it in further detail (feel free to comment if there's an especially relevant link).

	# Misc
	- LayerNorm/RMSNorm might be acting as lateral inhibition, a paradigm attempted in many 2000's and surrounding ML papers (Fern, {relevant sources needed})
	- 'Soft' (pre-determined or pre-compiled) architectures in the weights of your network can greatly increase convergences times and/or generalization.
	- Downcasting dtypes to a lower bit depth in your dot products can be a 'free' efficiency improvement in some circumstances.