Skip to content

Instantly share code, notes, and snippets.

@gauravssnl
Forked from MaskRay/loc.md
Created October 25, 2021 15:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gauravssnl/29bba778c0c24bcd43580f722b8fe333 to your computer and use it in GitHub Desktop.
Save gauravssnl/29bba778c0c24bcd43580f722b8fe333 to your computer and use it in GitHub Desktop.
Very coarse estimate of compiler complexity/LOC

It is by no means accurate and is comparing apples and oranges in many dimensions, e.g.

  • Targeting LLVM IR, C, assembly have varying difficulty.
  • Some may include runtime while some may not.
  • Different compilers are written in different languages. Languages have different expressiveness. Different paradigms have varying expressiveness.
  • LLVM has many non-default passes which are not used by regular compilation. I try to use fine-grained directories but still some unneeded files are included.

Nevertheless, here is the result (auxiliary files like shell/Makefile/documentation are ignored):

  • gcc (2207000+): tokei gcc libcpp -e ada -e d -e go -e objc -e objcp -e '*test*'
  • openjdk (1194000+): tokei src/hotspot src/java.compiler
  • llvm (1786000+): tokei *{include/llvm,lib}/{Analysis,AsmParser,BinaryFormat,Bitcode,Bitstream,CodeGen,IR,IRReader,Linker,LTO,MC,Object,Passes,Support,Target,Transformers}
  • clang (1002000+): tokei {include/clang/,lib}/{Analysis,AST,Basic,CodeGen,Driver,Frontend,Lex,Parse,Sema,Serialization}
  • rustc (518000+): tokei compiler -e '*test*'
  • ghc (423000+): tokei compiler
  • dmd (357000+): tokei src/dmd
  • ruby (306000+): tokei *.c *.h include internal
  • cpython (263000+): tokei Include Objects Parser Python
  • go (225000+): tokei src/cmd/{compile,internal} -e '*_test.go' -e 'src/cmd/compile/internal/ssa/rewrite*.go' -e 'src/cmd/compile/internal/ssa/opGen.go'
  • zig (218000+): tokei src
  • ocaml (129000+ (asmcomp is surprisingly small: 23000+)): tokei driver parsing typing middle_end asmcomp bytecomp
  • julia (101000+): tokei src
  • Nim (85000+ (part of 20000+ LOC lib/system which is coupled with compiler)): tokei compiler
  • crystal (79000+): tokei spec/{compiler,llvm-ir}
  • Idris2 (76000+): tokei src
  • koka (45000+): tokei src
  • lua (29000+): tokei -e '*test*'
  • Myrddin (25000+): tokei -e bench -e examples -e test

Estimate of rustc may be really coarse. There is no easy way skipping #[test]. Some libraries should be counted as well. cpython probably should include som part of Include.

My opinionated list:

  • llvm, clang: I will keep hacking as C++ is important.
  • ocaml: Surprisingly small (considering languages which are often compared against) but slightly ugly syntax.
  • cpython: Sigh.
  • zig: My estimate may be too coarse but the complexity is concerning with the feature list.
  • julia: Many nice features, but 1-based indexing and dynamic typing keeps me away.
  • Nim: So many features and perfect C interop in so few lines? Advanced features with C code generation has a value.
  • Idris2: Definitely cool but quantitative type theory is too scary.
  • koka: effect system and effect handler
  • lua: If you still have time learning one dynamic typing language, pick it.

My opinionated list of compilers worth learning with adequate complexity: Idris2, koka, lua, Nim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment