Two hot encoding was introduced in 2017 in "Marc G Bellemare et all "A distributional perspective on reinforcement learning" but the clearest description is in the 2020 paper "Dreamer-v3" by Danijar Hafner et al.) where it is used for reward and value distributions.
two-hot encoding is a generalization of onehot encoding to continuous values. It produces a vector of length |B| where all elements are 0 except for the two entries closest to the encoded continuous number, at positions k and k + 1. These two entries sum up to 1, with more weight given to the entry that is closer to the encoded number