Skip to content

Instantly share code, notes, and snippets.

View TrentBrick's full-sized avatar

Trenton Bricken TrentBrick

View GitHub Profile
@TrentBrick
TrentBrick / potts_closed_form.py
Created June 30, 2022 17:23
Potts Model Closed Form Expectation
# for use with a batch of sequences
def energy_torch(self, inp):
"""
Calculates in pytorch the hamiltonian energy.
Takes in the softmax over the sequences generated from the neural network.
Then computes the expected energy over this softmax in a vectorized way.
Parameters
----------
@TrentBrick
TrentBrick / PyTorch_bucket_by_sequence_length.py
Last active January 26, 2024 19:34
PyTorch BatchSampler for bucketing sequences by length
"""
PyTorch has pack_padded_sequence this doesn’t work with dense layers. For sequence data with high variance in its length
the best way to minimize padding and masking within a batch is by feeding in data that is already grouped by sequence length
(while still shuffling it somewhat). Here is my current solution in numpy.
I will need to convert every function over to torch to allow it to run on the GPU and am sure there are many other
ways to optimize it further. Hope this helps others and that maybe it can become a new PyTorch Batch Sampler someday.
General approach to how it works:
Decide what your bucket boundaries for the data are.