gnzlbg/bfloat16.md

## bfloat16.md

      
    Raw
  

              bfloat16.md
            
          
    Add f16b floating-point type for native support of bfloat16

I was asked to properly write my “concerns” here, so these are my concerns in the most constructive way I found to phrase them.

concern: llvm-bfloat16

This text claims that LLVM will add a native type for bfloa16. The claim is not supported in the text (e.g. no links to mailing list threads of people working or discussing this). The text should either support the claim or justify the native type in a universe where that never happens.

concern: unspecified-abi-issues

LLVM  binary16 native type, f16, is exposed by clang as __fp16. This is not a standard type in C, and ABIs do not cover it (e.g. SysV), and clang errors if it's used in return type and argument type positions (compilation error).
I don't see any information or arguments suggesting that the ABI story for bfloat16 might be different. If such information exists, the text should include it, and also include prior-art about the ABI of similar types.

concern: incomplete-prior-art

The text does not mention how these problems are already solved in Rust (e.g. no mention of the half crate which provides a binary16 type or the TensorFlow crates which provide a bfloat16 type), in other languages like C++ or Julia (which provide bfloat16 as a user-defined type), and it does not link to previous relevant discussions (about C++, Julia, Rust, or LLVM).
The text prior-art section should be comprehensive.

concern: unsupported-claim-library-solution-impossible

The text claims that a library solution is impossible and proposes a solution based on the assumption that the claim is true.
Prior art in Rust (e.g. tensorflow-rs) and other languages (C++ Tensorflow, Julia) shows that it is indeed possible to implement bfloat16 in a library just fine.
If LLVM ever adds native intrinsics for bfloat16, we can always expose Rust intrinsics, e.g., in a stable core::bfloat16 module, that libraries can use, e.g.,  core::bfloat16::add(i16, i16) -> i16 or core::bfloat16::as_f32(i16) -> f32, etc. in a backward-compatible way.
If LLVM also adds a native type, some of these Rust intrinsics might need to, internally, and transparently to users, bitcast from i16 to bfloat16 and back. In my experience LLVM is good enough at removing useless bitcasts. This change would be backward-compatible and transparent to users.
LLVM might also add target-specific intrinsics for this type (e.g. AVX-512 extensions). Those would be exposed via core::arch anyways, and a library could just directly use them in.
Prior-art suggests that the ABI of such a type might just be unspecified. In Rust, this means that we can do whatever we want, as long as we make the type an improper C type - for struct bfloat16(i16); this is automatically the case. If we ever want to give the type a more defined ABI, we can always add a flag #[repr(bfloat16)] struct bfloat16(i16); in a backward-compatible way.
It has also been claimed that such a library implementation cannot be efficient, but that claim is not supported. AFAICT, the claim is incorrect. A library and the native type can be made to generate identical LLVM IR (e.g. by making the library call core::bfloat16 intrinsics). From the POV of optimizations, precision, and performance, I don't see why there should be a difference.

concern: lack-of-comparison-with-library-solution

Even if a library solution has no codegen drawbacks over a language-level solution, there are many trade-offs at play that might make one approach better than the other.
For example, a library solution can be more flexible (e.g. support different behaviors via cargo features or different libraries), iterates quicker, and has a smaller impact on the implementation, than a language level solution. A native type solution would properly support literals, casts, ... and would have a "marketing advantage", e.g., Rust has first class support for bfloat16.
The text does not explore these trade-off and it should critically do so. For example, the better marketing claim can be easily misunderstood as Rust not being expressive enough to implement the type as a user-defined type (like Julia and C++ do). If the library ends up being worse than the language level solution, which would be fine, the text should make the extra effort of identifying whether other language features could make the library solution good enough (e.g. user-defined literal and user-defined casts), since maybe pursuing those instead would have a larger impact on the ecosystem than adding new primitive types every time we hit those roadblocks.

concern: insufficiently-baked

All in all this RFC feels insufficiently baked and would benefit from iteration in internals or an RFC issue to collect different use cases, alternatives, prior art, etc. If the tensorflow-rs implementation cannot be refactored as a library, it might make sense to implement this as a library first to gain experience, and wait to see what LLVM does. If the library is not good enough due to other limitations in the language, one can attempt to address those.
I'm surprised that it was submitted in this form into the process.
Some comments have argued that the intent was to discuss the idea. IIUC the RFC process is not the right venue for that, but that might be changing.
If the intent of the text was only to discuss the idea, the text should have instead focus on discussing the problem that needs solving (e.g. how are Rust programs that are already using bfloat16 solving this problem), include a comprehensive review of prior-art, and enumerate the alternatives discussed here, and try to analyze their trade-offs. Such a text might lend itself better to exploring the design space, than just proposing a concrete solution without much rationale, prior art, or alternatives.