The implementation complexity of vcompress.vm
for large vector length and higher LMUL has been somewhat debated.
Existing implementations exhibit very poor scaling when dealing with larger operands:
VLEN | e8m1 | e8m2 | e8m4 | e8m8 | |
---|---|---|---|---|---|
c906 | 128 | 4 | 10 | 32 | 136 |
c908 | 128 | 4 | 10 | 32 | 139.4 |
c920 | 128 | 0.5 | 2.4 | 5.4 | 20.0 |
bobcat | 256 | 32 | 64 | 132 | 260 |