We store class information in Klass, and resolving Klass from oop is a hot path. One example is GC: During GCs, we churn through tons of objects and need to get at least object size (layout helper) and Oopmap from Klass frequently. Therefore, the way we resolve an nKlass matters for performance.
Today (non-Lilliput), we go from Object to Klass (with compressed class pointers) this way: We pluck the nKlass from the word adjacent to the MW. We then calculate Klass* from nKlass by - typically - just adding the encoding base as immediate. We may or may not omit that add, and we may or may not shift the nKlass, but the most typical case (CDS enabled) is just the addition.
Today's decoding does not need a single memory access, it can happen in registers only.
In Lilliput, the nKlass lives in the MW (which allows us to load nKlass with the same load that loads the MW). Therefore, nKlass needs to shrink. The problem with the classic 32-bit nKlass is that its value range is not used effectively. Klas