Skip to content

Instantly share code, notes, and snippets.

@AlpinDale
Last active September 21, 2024 23:08
Show Gist options
  • Save AlpinDale/17babab5be16f522d4d3b134e171001a to your computer and use it in GitHub Desktop.
Save AlpinDale/17babab5be16f522d4d3b134e171001a to your computer and use it in GitHub Desktop.
Valid exponents and mantissas

Valid Data Types

Weight Bits: 2

Exponent Bits Mantissa Bits
1 0

Weight Bits: 3

Exponent Bits Mantissa Bits
1 1
2 0

Weight Bits: 4

Exponent Bits Mantissa Bits
1 2
2 1
3 0

Weight Bits: 5

Exponent Bits Mantissa Bits
1 3
2 2
3 1
4 0

Weight Bits: 6

Exponent Bits Mantissa Bits
1 4
2 3
3 2
4 1
5 0

Weight Bits: 7

Exponent Bits Mantissa Bits
1 5
2 4
3 3
4 2
5 1

Weight Bits: 8

Exponent Bits Mantissa Bits
1 6
2 5
3 4
4 3
5 2

Weight Bits: 9

Exponent Bits Mantissa Bits
1 7
2 6
3 5
4 4
5 3

Weight Bits: 10

Exponent Bits Mantissa Bits
1 8
2 7
3 6
4 5
5 4

Weight Bits: 11

Exponent Bits Mantissa Bits
1 9
2 8
3 7
4 6
5 5

Weight Bits: 12

Exponent Bits Mantissa Bits
1 10
2 9
3 8
4 7
5 6

Weight Bits: 13

Exponent Bits Mantissa Bits
1 11
2 10
3 9
4 8
5 7

Weight Bits: 14

Exponent Bits Mantissa Bits
1 12
2 11
3 10
4 9
5 8

Weight Bits: 15

Exponent Bits Mantissa Bits
1 13
2 12
3 11
4 10
5 9

Invalid Combinations

Exponent bits above 5 are invalid, because they result in negative bit shift, see this:

template <int EXPONENT, int MANTISSA>
__device__ __forceinline__ uint32_t MultScale(uint32_t PackedFP16Pair,
                                              half Scale) {
  constexpr int BIAS_OFFSET = (int(1) << (5 - 1)) - (int(1) << (EXPONENT - 1));
  constexpr int BIAS = int(1) << BIAS_OFFSET;
  //
  half* FP16_1 = reinterpret_cast<half*>(&PackedFP16Pair);
  half* FP16_2 = FP16_1 + 1;
  uint32_t output;
  half* output_half_ptr = reinterpret_cast<half*>(&output);
  output_half_ptr[0] =
      __hmul(__hmul(*FP16_1, __float2half(1.0f * BIAS)), Scale);
  output_half_ptr[1] =
      __hmul(__hmul(*FP16_2, __float2half(1.0f * BIAS)), Scale);
  return output;
}
Exponent Bits Mantissa Bits
6 0
6 1
6 2
6 3
6 4
6 5
6 6
6 7
6 8
7 0
7 1
7 2
7 3
7 4
7 5
7 6
7 7
8 0
8 1
8 2
8 3
8 4
8 5
8 6
9 0
9 1
9 2
9 3
9 4
9 5
10 0
10 1
10 2
10 3
10 4
11 0
11 1
11 2
11 3
12 0
12 1
12 2
13 0
13 1
14 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment