Skip to content

Instantly share code, notes, and snippets.

@brson
Created December 11, 2019 05:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brson/678afa13ba4572fafee5384f92e3140e to your computer and use it in GitHub Desktop.
Save brson/678afa13ba4572fafee5384f92e3140e to your computer and use it in GitHub Desktop.

Misc

The documentation doesn't consistently use "must" / "should", etc. language. Often it just says "use".

It would be good to have guidance about naming when there are multiple types that implement the same concept at different abstraction layers. TiKV has some pretty tough readability problems due to the common use of import renaming because of name clashes. Personally, in the engine implementations, I've established the pattern of using a naming prefix, like "RocksFoo", even though it might not be a standard practice, because I think the end result is going to be much easier to understand - see a "RocksFoo" and know it is part of the Rocks engine without having to scan the module imports. (on the other hand, with the end result being to have all these Rocks types hidden behind a layer of traits, in practice most readers are going to end up dealing with the traits only...).

For crates, I might include language about managing internal mutual dependencies between modules. Spaghetti internal dependency graphs may be the biggest obstacles to refactoring, especiall to breaking up crates. FWIW I really want a module annotation that enforces internal daggishness, and even wish dag-structured modules were the default.

A big community project could be started to bring TiKV up to these standards.

Discussion of breaking API changes is currently irrelevant for most tikv code.

The TiKV codebase is filled with curious fully-qualified Arc::clone calls, instead of .clone. I imagine, but don't know, that at some point somebody accidentally expensive-cloned a non-Arc that happened to type-check, and so this became a defensive pattern. If it's a good pattern it should probably be in the style guide. I don't love it though.

Some of the things in this guide can probably be enforced automatically, so should.

I might add a guideline about not using _ in unsafe pointer casts. In the common unsafe reference casting pattern, as suggested by clippy:

&*(foo as *const A as *const B)

(I'd love to have a ptr/mem function for this pattern since clippy says not to use transmute for it, but the open coded version is pretty unsightly. Haveing a name for this pattern would be good).

It's super tempting to use _ to have the compile infer the types, but its less type safe and mysterious to read.

General

Every FIXME should have a reference to a GitHub issue describing the issue and an explanation, e.g.,

While I agree with the end result, I find this guideline frustrating because it tends to require filing an issue about a bug that doesn't yet exist until the FIXME code is committed.

I don't know if the FIXME/TODO advice is generally accepted outside of rust.

Formatting

Use a single nested import... for each crate

I find this style hard to read and ugly once they contain multiple brace pairs. Rationale?

Prefer to use absolute paths

I agree with this, but could use rationale.

Exception, every test module should have a use super::*; import.

This is enticing but I also find it problematic since it also ends up bringing in the parents' private imports (and private items), which leads to surprising, though not particularly onerous, coupling during refactoring.

Example:

use crate::{
    pd::client::{Cluster, RetryClient},
    util::GLOBAL_TIMER_HANDLE,
    Result,
};
use futures::{
    compat::Compat01As03,
    prelude::*,
    ready,
    task::{Context, Poll},
};
use std::{
    pin::Pin,
    sync::Arc,
    time::{Duration, Instant},
};

Yeah I don't think I like this style though I would be curious to see it applied to some of the huge import sets inside tikv or other real codebases. It requires maintaining more mental context than necessary to read. I prefer only importing items from a single module per import. These are simple examples but the nesting can get much deeper. I don't see this style in the rust tree.

Naming

Crate names: kebab-case in Cargo and snake_case in Rust code

This is such a dumb pain-point in Rust. I think it's my fault for asking to be able to use kebab-case. Today I think it is simpler and consistent to use snake case for crate names. At least for libraries. For binaries kebab case makes more sense, since binaries usually use kebab-case. So maybe kebab-case for all crates is the simpler rule. It's such a frustrating experience to type in the wrong casing to some search tool and not find it. It would be nice for Rust 2021 to fix this papercut somehow.

When a name is forbidden because it is a reserved word (e.g., crate), use raw identifiers (e.g., r#crate).

I disagree with this. It's super ugly, and I prefer a postfix underscore, which I also think is more idiomatic in the code I read. Rationale?

Treat acronyms as words, e.g., GrpcType or grpc_variable, not GRPCType or g_r_p_c_variable/GRPC_variable

Good, though I think this is an uphill battle. It's been part of the API guidelines forever but people still use the GRPCType style, and I think more commonly. And sometimes I find it more readable, particularly with short acronyms.

If the constructor takes no arguments, prefer to implement Default instead.

I'm sympathetic but find this results in bad ergonomics since Default isn't in the prelude. Further, I don't think it's that common to parameterize functions over Default. So implementing Default is a premature abstraction in most cases, and nothing but a less convenient convention than an inherent new method.

This rule might be more reasonable for libraries, in which case I would still implement an empty new method.

There is also the consideration of whether an empty 'new' makes sense in the presense of multiple constructors, or whether 'new' is the "base", "biggest" constructor.

There's sometimes a distinction between types that should be empty-constructed and those that are expected to be provided with arguments.

which customise values which take defaults in new

this hints that "new" should be the baseline constructor that takes all the arguments but doesn't explicitly say it.

I think this ctor naming issue is pretty nuanced and am not convinced of a firm set of rules that apply to all situations.

E.g., foo for the default variation (whether that is owned or by-ref), foo_ref for a by-reference version, foo_move for a version which gives ownership, and foo_mut for a by-mutable-reference version

I don't recall ever seeing foo_move. take_foo seems more idiomatic, though the appeal of following the foo_* suffix pattern is obvious.

Comments

Wrap long lines of comments at 80 or 100 characters.

Seems like TiKV should pick one and enforce it with a tool.

https://doc.rust-lang.org/rustdoc/unstable-features.html#linking-to-items-by-type

TiKV seems to already use rust paths for internal links, per https://doc.rust-lang.org/rustdoc/unstable-features.html#linking-to-items-by-type, which seems like a better practice than encoding HTML urls directly in the docs.

Modules

Put attributes (e.g., #[cfg(...)]) on a module declaration (mod foo; or mod foo { ... }); do not use inner attributes (e.g., #![cfg(...)]).

This seems to say not to put "#!" attributes on files at all. I disagree with this since it separates the file-module's implementation from its attributes. The only attributes I would put on a mod foo; statement are cfg and path, things that influence the loading or existence of the module at a particular location.

And of course inner attributes have to be used on crates.

use re-exports (pub use) to expose API from the submodules.

I think this is overused in TiKV. Tracking down where types actually live is so boring. I'm so conflicted about this pattern. I think maybe it's good for some published crate APIs, but mostly obfuscating for internal-only crates.

Rationale: this worked badly for the Rust standard library, and some other libraries.

I am not convinced by this argument in the general case or in the specific case of std.

Where a module is used for a single impl and many names in that impl would need to be imported, import only the module and use qualified names. A common example is implementing the std::fmt::Debug or Display traits, import std::fmt and use fmt::Formatter, fmt::Result, etc.

I'm skeptical of this as a general rule. It's common for fmt but I think that's especially because it's commonly required to deal with fmts name-clashing Result type.

To disambiguate names, prefer to use qualification, rather than renaming. E.g., use fmt::Result rather than use std::fmt::Result as FmtResult.

This might help a lot in TiKV.

Never import enum variants, qualify use with the enum name (except for very, very common variants such as Some and Err).

I think there are other exceptions to this, particularly when implementing Display, Error, other times when I need to enumerate every variant, or when writing the variants a lot in e.g. a table, I will *-import the variants into function scope to avoid redundant noise.

Prefer to import a prelude module if available (using a * import).

I think preludes are an anti-pattern and never use them when provided. I want to know exactly what's in scope. I'm even a bit skeptical of the std prelude.

Prefer to import macros rather than using extern crate and/or macro_import.

I agree with this, but I note that sometimes I find myself not being able to use macros for reasons I don't understand. Particularly with things like derive and other decorators. The rules are not obvious to me.

Crates

Every crate should have some crate-level documentation in their top-level module and in a README.md.

README.md seems excessive. Do we really want that in every internal TiKV crate?

Use a Cargo workspace where there are multiple crates in one logical project.

If using a workspace, either have one top-level crate, or no to-level crates and one crate with the same name as the workspace.

Is the second recommending a "virtual workspace"? Personally I think TiKV should try moving to a virtual workspace - at the moment there is no clear "top" crate, but the main crate is the tikv crate, for historical reasons I think. A virtual workspace would declutter the top-level manifest and make it clear that the tivk crate is really just another crate in the middle of the dag.

If publishing a workspace crate, minimise the number of crates in the workspace.

The rationale is not obvious to me.

One-off actions should usually be OS scripts unless they need some Cargo functionality.

Not clear the definition of "one-off" action. It is though commonly necessary for big projects to create a thin Makefile wrapper around cargo, or to add shell scripts.

Export a minimal and coherent API. See the debuggability, dependability, flexibility, interoperability, and predictability in the Rust API guidelines.

This reads pretty vague and inactionable to me, especially the word "coherent", though maybe reading all the suggested links would give me a clear idea of what it means.

Prefer to avoid unstable features.

Might want language about best practices for feature-gating unstable features in published crates.

Procedural macro crates should be named 'foo-macros', where 'foo' is the main crate they support.

What about "foo-derive"?

Small crates often improve compile times by improving parallelism. However, this effect is often small and will be smaller in the future, so we shouldn't prioritise it over good design.

I cringe at the dichotomy between crate decomposition and good design. I tend to think that a well-factored crate dag is a good design and that monolithic crates with lots of circular internal dependencies are an indication of unclear or sloppy design considerations (also, circular dependencies are I think always going to be bad for compilation performance). I know there's push-back about small crates and facades in the Rust zeitgeist right now, but I'm not convinced.

Publishing workspace crates is frustrating.

This needs more explanation. I can't guess the meaning.

Security-sensitive dependencies should be properly audited.

More could be said about security-sensitive code, like prefer using crates which are already established as best-in-class, don't write your own, audit the crate feature flags for the most secure settings (e.g. asm vs rust impls).

Use the simplest version possible.

Needs more explanation.

It's not clear what this means, but I think it means, at least in part, to prefer not specifying the point number.

Ignoring point numbers in manifests appears to be widely considered a best practice, but one I strongly disagree with. I specify a point number of my dependencies because that is the one I've validated. I assume more recent point releases will still work correctly, but not that older point releases will - previous point releases may have contained bugs that I will never see.

If you must use a Git dependency, always include a rev

Thank you.

Sections should be separated by a single blank line.

This is quite prescriptive. I don't have an opinion here, but I know in Markdown my habit is to break up sections with 2 to 4 lines, depending on the document structure and complexity, to make scanning easier.

Limit lines to 100 characters.

100 chars for toml files but 80 OR 100 for rust files?

Rationale: they can cause complications with complex dependency graphs, and Cargo has some bugs with default features.

It can also cause downstream to pull in features they don't use, increasing compile times.

Data structures

Prefer small structs. Build several small structs into one larger one.

This seems pretty situation-specific since the motivation is to make borrowing internal fields easier. If I don't need any complex borrowing then it doesn't matter.

Empty structs should usually be defined as struct Foo;, not struct Foo {} or struct Foo();.

Is there an exception?

If a struct is repr(C) then the order of the fields is significant. Usually, the most efficient representation is to list the fields in descending order of size. If a different ordering is required or more performant, document it.

This is kind of odd without context about why you would put repr(C) on a struct in the first place, while also having a choice in the field layout.

Most newtypes and similar wrappers should be repr(transparent).

This is not obvious to me though I can imagine it being more efficient for wrappers over scalar types in particular. In that case though, using a more efficient ABI seems like something rustc should just do on its own.

Enums with many variants are fine.

This is a curious thing to state pre-emptively. I guess there's a common perception that variants are bad?

Prefer using structs to tuples in types (e.g., of fields, return types, etc.). I.e., most cases where tuples are good to use are where the type is anonymous (e.g., in closures).

2-tuple return types are fine as long as they have two different types, especially if the are usually immediately destructured, like with iterators. The comment about closures isn't clear to me. Closure types are anonymous, but not their arguments or return types. I don't see the connection between good-tuples and closures.

It is recommended to use the builder pattern in preference to multiple constructor functions.

I'm reluctant to recommend this, particularly for internal APIs like in TiKV. Builders involve a lot of boilerplate and are most effective when there's a large combination of possible options.

If there are a managable set of constructors that each have an obvious use case, then they are easier to write, read, and call.

Builders are nice for "fancy" public APIs where readability is especially desired.

They also have the obvious design tradeoffs in Rust between by-value and by-mut, where neither is completely satisfying.

(Also, I don't remember where it comes from, but there's a notion that "patterns" represent missing language features, and I think the builder pattern is the big obvious example of this in Rust - so much ceremony, and the end result still isn't great).

Prefer non-consuming builders (i.e., builder functions should take and return &mut self, not self).

I disagree with this because I often find that it forces me to create an initial local to root the reference, which hurts the "fluency" of the API. (Maybe this is something that is overcome in current rust - I don't remember the exact problem).

Also, this lacks the nuance of addressing whether the final "build" function should by by-mut or by-value, and managing invalidated builders.

Traits

Use the appropriate trait, don't use a trait beyond its intent.

This reminds me that tikv uses "{:?}" for weird production purposes, like splatting errors into other errors. Could be worth calling it out specifically.

Implement Debug wherever possible.

Yes for published libraries, no for unpublished code. Yagni.

manual destructors should never be used

I disagree, though Drop needs to still be implemented as a fallback. There are cases where you want the caller to explicitly destroy a type, particularly if the destruction is fallable or blocking, e.g. a server shutdown. Explicit and fallible destructors are on my Rust wishlist.

Drop implementations should not fail in other way

Not clear what this means. You've already said never panic. What other kind of failure is there? Some failures are impossible to avoid, like failed I/O flushes.

If a destructor might panic, fail, or block, provide a method which performs the shutdown behaviour and which returns a result (or, if necessary, panic), e.g., a close method. The user may then call the 'shutdown' method and check for errors. Then the destructor is guaranteed to be well-behaved. This should be documented.

Needs more guidance about how a destructor should behave here if the user doesn't cal the shutdown method. Panic, or shutdown anyway?

Fn, FnMut, and FnOnce should be implemented by function-like objects which can be called (e.g., callbacks).

I would just say never implement these traits. Is there an example where doing so is a good idea?

Functions methods traits

Prefer to use generics and trait bounds (or impl Tr) rather than concrete types.

This is a pretty strong blanket statement. There are a few common patterns where people tend to prefer generics, like with AsRef, but personally I'm happy with explicit types and less generic bloat. APIs filled with clever generics are hard to read.

I particularly dislike the pattern that uses generics to accept either Option or T...

If a function is only used by methods of a single data type, it should be a static method of that data type.

It's not obvious to me what this means, but I think it's saying to write private functions as static methods. I don't really agree with this, since I think it clutters up the type - helper functions that don't have a receiver, but are written as static methods require extra syntax to call and create an impression of greater coupling than actually existse. Modules are the primary abstraction boundary in Rust, so private standalone functions are more suitable for these cases. I standalone function written after an impl block suitably deemphasizes these kind of functions.

Generics are usually, but not always, better.

Citation would help.

Expressions

Avoid using let _ = to address an unused result warning. Prefer to use ? or unwrap.

What about expect? TiKV is filled with bad unwraps...

Prefer to use turbofish rather than type ascription

I was not even aware rust had type ascription yet! Seems like the feature is still in flux.

Avoid pattern-matching on all fields in a struct - this is a backwards compatibility hazard.

What's the hazard?

Errors

If a function can panic, that should be conveyed in the name of the function - usually by including unwrap or assert.

Never seen this guideline before. Maybe word more like, if a function is expected to panic.

Performance

Prefer to use push and push_str to build strings rather than the format macro.

This seems pretty untenable. format is too convenient and readable, and conveys intent clearly. I don't want to open code a string builder every time I need to compose a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment