Skip to content

Instantly share code, notes, and snippets.

@rylev
Last active February 2, 2023 10:08
Show Gist options
  • Save rylev/0e3c3895dcb40b6a1c1cf8c427c01b5e to your computer and use it in GitHub Desktop.
Save rylev/0e3c3895dcb40b6a1c1cf8c427c01b5e to your computer and use it in GitHub Desktop.
Rust in Large Organizations Notes

Rust in Large Organizations

Initially taken by Niko Matsakis and lightly edited by Ryan Levick

Agenda

  • Introductions
  • Cargo inside large build systems
  • FFI
  • Foundations and financial support

Attending

  • Joe, Microsoft, Seattle Rust Meetup
  • Tom at Mozilla, using Rust for sync
  • Lena at Mozilla, sync storage etc
  • Jack Moffit at FB, Libra team
  • Brian Anderson at Pingcap
  • acrichto
  • erickt
  • dtolnay, David Tolnay
  • Raj Vengalil, Azure IoT
  • cuviper, Redhat
  • Rain, FB
  • Jeremy F, FB
  • Manish
  • Ben, Google
  • Philip, Qumulo, Rust dev tools + infra
  • Remi, Qumulo
  • Sebastian, MS, pushing for Rust adoption from sec pov
  • Thomas Ekerd, MS, site reliability engineer
  • James, MS
  • Brandom Williams, FB
  • JR, Mozilla backend services
  • Phil
  • Will, crash ingestion mozilla
  • Stjepan, Ferrous system

cargo

  • FB dev env -- backend services repo -- is mostly C++ and Java. Very polyglot environment. Glued together with Buck, FB's Bazel.

    • Buck: Language agnostic. Supports Rust.
    • rustc drops in quite nicely, basically equivalent to C++ compiler.
    • wanted to use cargo but it just does too much to fit in
    • need to delineate parts of cargo that are desired with those that conflict with Buck
    • ecosys is big advantage for Rust but hard to separate from cargo
    • current scheme:
      • big cargo.toml including all the things used in internal repo
      • cargo builds artifacts that are presented to buck
      • buck can link against those
      • reasonably successful
      • but approaching 700 crates in transitive dep graph, getting very cumbersome to rebuild etc
      • plus pinned to a specific version of compiler (prebuilt artifacts)
      • works ok but build.rs build scripts are a big complication
    • specific cargo pain points:
      • build scripts
      • "features" feature
      • a lot of crates don't use features the way they're intended -- they're used for exclusive A or B choices
        • this creates the possibility to break the build
        • need some sort of "cfg" feature that represents forks of a crate
  • Google does a similar thing for fuschia

    • cargo builds 3rd party artifacts, normal build consumes those
    • problems:
      • handful of 3rd party artifacts depend on things built in tree
      • want to be able to do partial builds, e.g. w/o a feature, or just for some targets
      • developing for a new OS, so we compile some code for host, some for target
      • presently do 2 full builds, but it's a pain
      • don't have as much control over the flags getting passed to rustc as we'd like
      • dep flags + linker flags aren't as specific as we need to distribute deps that are needed for indiv targets
      • prototype using "cargo raise" (use "gn" (from Chrome) to generate ninja files)
      • based on a modification of cargo raise that generates bazel build files
        • has its own handling of build.rs stuff
        • rather than outputting build files, it outputs a json format that could be the basis for the proposed "cargo build plans" feature
          • would be good to know what inputs etc are needed, how this would fit for Buck
          • can Buck consume internal files?
    • gn is aware of the concept of a Rust target
  • Qumulo build system:

    • doesn't use cargo, invokes rustc directly
    • cargo just builds json
    • build all deps as shared libraries, whether or not they want that
      • .so libraries, .rmeta files
      • hits a lot of problems
    • ran into problems, notably lack of support for build.rs -- have to reimpl cargo
    • building for 2 different targets
      • have own platform
      • linux target for procedural macros
    • need sometimes to pass flags that are target specific, build a target config map
    • would prefer to use cargo
  • does cargo raise support build.rs?

    • has some builtin support for build.rs?
    • not automatic: you declare purpose of build.rs
    • things that do rustc version detection?
    • sometimes you want to (e.g.) disable build.rs that supply native deps which come from bazel
  • why can't you run build.rs as part of the build tool?

    • fundamental problems:
      • no declared inputs, no declared outputs
      • buck/bazel etc has to know what files the build script is consuming, producing etc
    • also, they are arbitrary execution, which can be a security concern
      • proc macros have some similar concerns.
        • e.g., pest which looks at cargo source dir env variable and finds your grammar def'n file
          • doesn't fit well
  • one thing that was discussed years ago:

    • capability system for build.rs that restrict what scripts can do
    • e.g., read from this directory, write to that one
    • cargo can then audit/sandbox to enforce said rules
      • run build script in a sandbox
        • e.g. crossvm has an impl of this inside of chrome; all crossvm devices run in their own jail
      • nontrivial engineering effort
    • could do at a higher level, sandbox
  • jeremy: build scripts classified into 3 or 4 distinct types, is this complete?

    • doing codegen. read a file, bindgen, etc
    • gateway to some other library, using pkgconfig or something to find the library, or they build it from source
    • feature detection on rustc
    • "scary ones" -- database reads, timestamps
  • plausibly could address those use cases in other ways

    • feature detection is an obvious one, e.g. we had an rfc for compiler versions
  • version compat is a common thing

  • what version of rust are people using?

    • stable
    • "stableish" -- bootstrap
    • nightly
  • who here is using toolchains distributed by rust?

    • ms (partially), mozilla, libra
  • why a custom toolchain?

    • config.toml tweaks
      • use clang's version of some unwinding code
      • custom linker
      • panic=abort
    • custom targets
    • compliance reasons (wanting to build from source for security reasons)
  • bootstrapping + compliance

    • where to get initial rust version?
    • several attempts:
      • most successful is using mrustc at version 1.22 and building from there
      • ms, google did that
    • is there a possibility of long term drift?
      • builds are not quite reproducible at present, but almost
    • was a point where build w/ mrustc + build with toolchain had non-matching hashes
      • might have to tweak the paths
      • in principle it can be done, should maybe prioritize it
  • maybe have an approved "how to bootstrap from C" documentation

  • specific reason fb builds from source:

    • want to always have the option to apply a local patch
    • don't want to get stuck with a "we must have this patch yesterday" scenario and have to figure out how to apply patch then
  • in most cases, also building llvm, want to share llvm for cross-lang LTO

    • must have a newer LLVM than what rust ships with
  • some folks have cross-lang LTO working

    • but rustc doesn't want to produce bitcode files
    • pass the linker /bin/echo
  • pgo -- coming soon

  • fb uses after the fact binary rewriting

  • splitting out linker was a potential change to rustc or cargo that google wants

  • would be interesting to know "here is what must be passed to gcc to successfully link"

  • another option: give a python script as the linker

    • turns out servo does it, too
  • show of hands survey:

    • "who is interested in a common backend for 'those things'"
      • nobody knows what that means
  • buck needs a "fully specified dep dag", seems like a common thing for other build systems

    • seems like we have to do a few cases to work out the general rules first
  • rudimentary cargo build plan support:

    • gives a dag of rustc executions
    • but it's too low level for buck, also bazel
  • pressure: every once in a while people propose "rewriting cargo.toml" into the tree

    • so far resisted that
    • a possible outcome buck has thought of:
      • buck support for cargo.toml
      • ton of code that's open source for people (natch) don't want to build w/ buck out of tree
      • want ability to simultaneously maintain buck/cargo support
      • currently done by hand and horrible
      • internally even people want this for mac/win builds which buck doesn't support
      • google w/ gn does something similar, keeps cargo.toml in order to upstream it
        • in some cases can generate a cargo.toml file programatically
        • also imp't for IDE support
  • IDE support

    • RLS kind of working with buck
    • knowing laughter :)
    • problematic assumptions: e.g., searching the filesystem for cargo.toml, but it's millions of files
    • symptom of a larger thing
      • cargo is designed for managing rust code
      • assumes source tree is mostly rust code
      • but often rust is embedded in a large source tree with tons of non-rust
        • so having some "root for all rust code" where you search below is problematic
      • top-level directory not gonna work
        • always having to create artificial "root" directories
    • rust-analyzer avoids this by not baking cargo in as deeply
      • but still has this "top level directory" model that contains all the rust code which means a small amount of rust amongst everything else
  • generating a cargo.toml for 1 project works well, but when you have multiple targets that interact

  • Qumulo has a ton of C and Rust code that must be all combined into one big final artifact

    • IDE support that avoids cargo is a must
    • current state of the art: ctags
  • cramertj: cargo.toml is basically the intermediate repr for specifying deps

    • are there other things one might want?
    • build system has its own custom language to do that description
      • can use that to generate cargo.toml files though for IDE etc
        • what changes might one want in a "non-cargo IDE language"?
          • maybe cargo would work fine
  • manish: does this also cause problems for clippy and rustfmt?

    • cargo.toml is also useful for this
  • who uses clippy? most folks

  • rustfmt? most folks

    • fb invokes it on individual files for that
  • libra uses cargo to build

    • "cacheability" (sccache) has gotten worse over time
    • procedural macros aren't getting cached (dylibs)
    • are other people doing anything with this?
    • ff has a distributed cache in the office
    • (buck does caching of everything)
      • native deps? also integrated into buck
      • assume that if a C dep changes, rust must be rebuilt?
      • -lnative is not very well-scoped (just to a directory, not specific libs)
      • problem: can't cache link steps as a result
      • maybe also part of the problem with sccache
      • in buck, each lib gets its own directory, sidestepping this problem
  • linker want:

    • ability to specify a specific mapping from link name to the native library
    • option to ignore link directories or transform
    • in buck case, if you have a dep on a native library, you get two options (-lfoo and full path to foo)
  • crate features, misuse thereof:

    • people seem to want option to have mutually exclusive features
    • want to have impls clone etc for testing but not in a release build
    • hacked up something using cargo features but doesn't work all the time
    • problems:
      • dev dependency foo with feature "testing"
      • sometimes testing gets turned on semi-randomly (???)
      • but you can also accidentally use "testing" in a normal tree
    • deps for build scripts leak through to the real graph, perhaps part of the "semi-random" behavior
  • designing from the wrong direction, perhaps?

    • a lot of requirements coming up that are "above and beyond" existing cargo spec and design
    • contra: goal is to have cargo co-exist with buck/bazel/etc, these are the features needed for that?
  • do we want to build another tool that is not cargo?

    • but everybody already has a tool and wants to use it
    • but how can we do minimal work so that integration of cargo + these other tools is smoother
      • working with rest of rust ecosys
  • de facto standard that crates.io + cargo have created

    • defined entirely by impl of cargo
    • only access at present is through cargo's impl
    • refactoring cargo into indep chunks with better interfaces might be the sol'n (and has been discussed)
      • cargo build plans, but they're not there yet
    • key thing: version resolution, very much in cargo's domain, would be good to specify
  • external dependencies + FFI?

    • can we use FFI to talk to rust?
    • want module boundary between rust things, using ffi
    • today: build scripts in cargo exist, common thing is to build+link to native libraries
      • one of the things that cargo raise does, you can describe the purpose of a build.rs (e.g., primarily to produce that 3rd party lib)
      • but you can translate that to a dep for that native library in your build system
  • summarize + action items?

    • cramertj wants to know what
    • dtolnay is working on a potential design ideas for a successor to build.rs
      • cargo metadata description to specify what it is doing, maybe replace build.rs?
      • just listing inputs would be a huge improvement
        • yes but we want something that's easier than build.rs today, to incentivize it
    • caching, can we improve it
      • some of it may be low-hanging fruit, e.g. on mac .a file has timestamps
      • but part of it is the growing popularity of procedural macros (.so are uncachable by sccache)
        • if linker were more predictable, sccache could handle it, but it's not
        • might be able to handle by separating out linking
  • how to translate cargo.toml etc?

    • buck today runs cargo, takes output with dep info + rlib files
    • but new tool goal is to determine from cargo metadata
      • no way of "definitively connecting" resolved deps with unresolved deps
  • cargo vendor tends to be a bit overagressive

    • lots of things people want, seems to vary between groups
  • when developing procedural macros, could do better job of noticing token stream output hasn't changed..

    • incremental
    • sccache sometimes handles that well (e.g. w/ build.rs)
  • related topic: distributed builds

    • sccache has support for that
      • but maybe sends whole dep folder, not always ok
      • would need more precise dep information to handle that (passing precise info for transitive dependencies)
        • --extern is precise, but transitive deps are still figured out by rustc
    • related: would be nice if, for rustc, could pass all the sources explicitly
      • in buck do you list all sources?
        • yes but a lot of globs :)
  • would be nice to have a tool that handled all the easy cases, with room for "extra" cases here and there

  • alex: interested in solving a lot of these issues and have thoughts

    • open to talking later about this stuff
    • a lot of small details, bug fixes, etc -- long road, no silver bullet
  • some kind of "enterprise cargo" place to hold this discussion(s)

  • a lot of needs boil down to:

    • quick fix combined with longer re-architecture

FFI

  • two distinct languages invoking one another
    • sometimes linked into one process, sometimes cross process (RPC)
    • COM requires symbols to be ABI compatible
  • inline assembly, direct syscalls
  • "C parity"
  • FFI with C and C++
  • FB is doing C++ interop, as is Google
  • FFI beyond C or C++?
    • Java
    • syscalls
    • C# perhaps
    • (Ruby, Python)
  • Bindings to other languages are often mediated through a C layer
  • Increasing number of users -- C and C++ wanting to consume Rust APIs
  • Concerns:
    • unwinding
  • Qumulo: basically spent most of the last year preparing to do bidir FFI between Rust and C
    • fairly larger codebase in a dialect of C
    • rules you can impose on C side which helps sometimes
    • in one direction (Rust calling C) we have been able to use bindgen
    • but in the other direction (C calling Rust) we wrote a compiler plugin (uh oh) to generate C headers
  • Specification questions
    • concerned about cross-lang lto revealing a lot of interactions
  • Cross-lang thin lto
  • Dynamic testing and static testing
  • Have aliasing rules proven to be a problem?
    • FB: not so much. Mostly mediating rules through bindgen and trying to set things up to get compilation failures
    • Google: currently checking for changes
  • Google: pursuing a bit ways to annotate C and C++ headers so that can generate safe rust signatures from it
    • might be an interesting thing to standardize on
    • bindgen has a cumbersome mechanism for that (do)
    • would be nice to include small shim layers e.g. to translate to Result
  • FB:
    • C++ codebase in FB uses exceptions, have wrappers that captures and converts exceptions, this becomes a Result on the Rust side
      • manually annotating noexcept functions? basically all of them can
      • C headers are manually created with a try { } except block in C++
    • the code being interop'd is mostly C++ but have to manually write C APIs for it
    • build with panic=abort? no, unwind
      • also catching Rust exceptions at boundary?
        • C code doesn't call into Rust code that often
        • happy to make it abort though
          • but mozilla wants to handle panics, though it does it by translating it into a swift/java exception
            • usually the purpose is wanting to capture the call stack and report it
            • in theory could panic=abort if could capture java stack
    • FB sets a custom panic handler to report errors, then exits (could use panic=abort)
  • For COM FFI case? how handling virtual dispatch
    • manual adaptation with vtables and things
    • on Rust side, does that "look like" a trait?
      • active area of investigation
      • believe that (with proc macro support) can expose a trait that is actually a struct + vtable
      • similar to what GNOME projects are doing for glib bindings
      • mozilla does it for XPCOM, which is basically same thing
    • various bits of existing crates, but it's mostly nasty
  • Jeremy: one thing I've been thinking about:
    • standard set of library functions corresponding to C++ types
    • e.g. some way to use std-string from within rust code
    • good to have for templated types (unique-ptr, shared-ptr, and so on)
    • all types that can be directly used from Rust in some way
    • quite clunky today to have a C++ function that returns something Rust can use
    • on C side, it'd use the plain C++ types
    • but on Rust side, it'd invoke and do the right things
    • one of the pieces needed for C++ interop
      • instantiate the vec/string/other impls
    • should this part of bindgen?
      • missing part: manually instantiating separate things for each specialization
  • major topics of FFI
    • being able to "use header files" and get a "reasonably safe" FFI in Rust
  • what are building blocks we'd need to move things to user space?
    • template instantation list is one building block -- somebody has to write the tool, nothing needed from rustc
  • expectation is that there is always some work to manually bind
    • but what is minimal work we can do to make it easy to translate
  • annotations might be company specific -- fb vs google?
    • maybe? but can we collaborate?
    • different C++ dialects and patterns in use
  • what about from other languages, esp. around C++?
    • closest inspiration might come from Swift
  • rich bindings from Rust to C++ for hashmaps etc
    • because FB uses thrift for RPC mechanism (and sometimes FFI)
    • would be useful to be able to do tricks like that for hashmap and sets perhaps
    • some kind of tool for consuming a C++ header file to automatically produce an interface in Rust
  • complication in some environments: multiple allocators

use of unsafe

  • ms: would like to know how to control use of unsafe in codebase
  • google: grep
  • servo used the compiler directives to disallow unsafe where possible
    • in some cases, allow unsafe within a specific file
    • integrate with review tool to draw attention
  • unsafe is really many things: sometimes simple, sometimes not
  • C++ code: all unsafe? not reviewed under the same standard?
  • more interesting question is unsafe in dependencies
  • auditing in crate graph in general is a problem
    • geometric growth of deps
  • how do you audit safe code?
  • would be great if there were some central place doing auditing (and getting paid to do it)
    • but we'd also need some mechanism to declare what's been audited etc
    • blessed crates and versions
    • let crates.io metadata include auditing
  • presumably want to know also things like 2fa, review policy, etc
  • attacks these days are very targeted in other ecosystems -- e.g., replacing specific versions of crates to attack specific targets
  • number of deps are in the hundreds, ranging from a few hundred to ~800 depending on project
    • in some cases, can pull in a frozen diff and not update
    • but not all
  • auditing of the compiler itself?
    • would prefer to have two implementations maybe

"governance"

  • MS: do we know what's going into the compiler?
    • do we know what changes are going in?
  • FB: not been a big concern of ours
    • in some cases, had issues where things got stabilized or bug fixes that broke code
    • would like to be canarying the nightly compiler regularly
    • but having more impl's would increase confidence
  • ways to support?
    • contracting
    • full time hires
    • how can we give $$ to rust org?
      • need a foundation
    • money/resources for Rust CI
  • participating in crater?
    • working on a way to run crater and send back pass/fail
  • ecosystem support
    • filling gaps in ecosystem
    • supporting key crates
    • helping to file GSoc proposals?

will we do this again? how to continue these conversations?

  • don't need super frequent updates
  • most helpful thing is to identify topics and spin off topics
  • try to provide feedback for roadmap
  • organize a regular meeting on zulip to talk about issues
    • quarterly maybe
  • we might want to consider f2f meetings in other conferences or at least in europe
    • maybe rustfest
  • key point:
    • don't want to alienate and separate enterprise from the Rust community at large
    • focusing on working groups and zulip for communication is a win
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment