It is by no means accurate and is comparing apples and oranges in many dimensions, e.g.
- Targeting LLVM IR, C, assembly have varying difficulty.
- Some may include runtime while some may not.
- Different compilers are written in different languages. Languages have different expressiveness. Different paradigms have varying expressiveness.
- LLVM has many non-default passes which are not used by regular compilation. I try to use fine-grained directories but still some unneeded files are included.
Nevertheless, here is the result (auxiliary files like shell/Makefile/documentation are ignored):
- gcc (2207000+):
tokei gcc libcpp -e ada -e d -e go -e objc -e objcp -e '*test*'
- openjdk (1194000+):
tokei src/hotspot src/java.compiler
- llvm (1786000+):
tokei *{include/llvm,lib}/{Analysis,AsmParser,BinaryFormat,Bitcode,Bitstream,CodeGen,IR,IRReader,Linker,LTO,MC,Object,Passes,Support,Target,Transformers}
- clang (1002000+):
tokei {include/clang/,lib}/{Analysis,AST,Basic,CodeGen,Driver,Frontend,Lex,Parse,Sema,Serialization}
- rustc (518000+):
tokei compiler -e '*test*'
- ghc (423000+):
tokei compiler
- dmd (357000+):
tokei src/dmd
- ruby (306000+):
tokei *.c *.h include internal
- cpython (263000+):
tokei Include Objects Parser Python
- go (225000+):
tokei src/cmd/{compile,internal} -e '*_test.go' -e 'src/cmd/compile/internal/ssa/rewrite*.go' -e 'src/cmd/compile/internal/ssa/opGen.go'
- zig (218000+):
tokei src
- ocaml (129000+ (asmcomp is surprisingly small: 23000+)):
tokei driver parsing typing middle_end asmcomp bytecomp
- julia (101000+):
tokei src
- Nim (85000+ (part of 20000+ LOC lib/system which is coupled with compiler)):
tokei compiler
- crystal (79000+):
tokei spec/{compiler,llvm-ir}
- Idris2 (76000+):
tokei src
- koka (45000+):
tokei src
- lua (29000+):
tokei -e '*test*'
- Myrddin (25000+):
tokei -e bench -e examples -e test
Estimate of rustc may be really coarse. There is no easy way skipping #[test]
. Some libraries should be counted as well.
cpython probably should include som part of Include
.
My opinionated list:
- llvm, clang: I will keep hacking as C++ is important.
- ocaml: Surprisingly small (considering languages which are often compared against) but slightly ugly syntax.
- cpython: Sigh.
- zig: My estimate may be too coarse but the complexity is concerning with the feature list.
- julia: Many nice features, but 1-based indexing and dynamic typing keeps me away.
- Nim: So many features and perfect C interop in so few lines? Advanced features with C code generation has a value.
- Idris2: Definitely cool but quantitative type theory is too scary.
- koka: effect system and effect handler
- lua: If you still have time learning one dynamic typing language, pick it.
My opinionated list of compilers worth learning with adequate complexity: Idris2, koka, lua, Nim.