brson/style-guide-review.md Secret

## style-guide-review.md

      
    Raw
  

              style-guide-review.md
            
          
    Misc

The documentation doesn't consistently use "must" / "should", etc. language.
Often it just says "use".
It would be good to have guidance about naming when there are multiple types
that implement the same concept at different abstraction layers. TiKV has some
pretty tough readability problems due to the common use of import renaming
because of name clashes. Personally, in the engine implementations, I've
established the pattern of using a naming prefix, like "RocksFoo", even though
it might not be a standard practice, because I think the end result is going to be
much easier to understand - see a "RocksFoo" and know it is part of the Rocks
engine without having to scan the module imports. (on the other hand, with the
end result being to have all these Rocks types hidden behind a layer of traits,
in practice most readers are going to end up dealing with the traits only...).
For crates, I might include language about managing internal mutual dependencies
between modules. Spaghetti internal dependency graphs may be the biggest
obstacles to refactoring, especiall to breaking up crates. FWIW I really want a
module annotation that enforces internal daggishness, and even wish
dag-structured modules were the default.
A big community project could be started to bring TiKV up to these standards.
Discussion of breaking API changes is currently irrelevant for most tikv code.
The TiKV codebase is filled with curious fully-qualified Arc::clone calls,
instead of .clone. I imagine, but don't know, that at some point somebody
accidentally expensive-cloned a non-Arc that happened to type-check, and so this
became a defensive pattern. If it's a good pattern it should probably be in the
style guide. I don't love it though.
Some of the things in this guide can probably be enforced automatically, so
should.
I might add a guideline about not using _ in unsafe pointer casts. In the
common unsafe reference casting pattern, as suggested by clippy:
&*(foo as *const A as *const B)

(I'd love to have a ptr/mem function for this pattern since clippy says not to
use transmute for it, but the open coded version is pretty unsightly. Haveing a
name for this pattern would be good).
It's super tempting to use _ to have the compile infer the types, but its
less type safe and mysterious to read.
General


Every FIXME should have a reference to a GitHub issue describing the issue and
an explanation, e.g.,

While I agree with the end result, I find this guideline frustrating because it
tends to require filing an issue about a bug that doesn't yet exist until the
FIXME code is committed.
I don't know if the FIXME/TODO advice is generally accepted outside of rust.
Formatting


Use a single nested import... for each crate

I find this style hard to read and ugly once they contain multiple brace pairs.
Rationale?

Prefer to use absolute paths

I agree with this, but could use rationale.

Exception, every test module should have a use super::*; import.

This is enticing but I also find it problematic since it also ends up bringing
in the parents' private imports (and private items), which leads to surprising,
though not particularly onerous, coupling during refactoring.
Example:
use crate::{
    pd::client::{Cluster, RetryClient},
    util::GLOBAL_TIMER_HANDLE,
    Result,
};
use futures::{
    compat::Compat01As03,
    prelude::*,
    ready,
    task::{Context, Poll},
};
use std::{
    pin::Pin,
    sync::Arc,
    time::{Duration, Instant},
};

Yeah I don't think I like this style though I would be curious to see it applied
to some of the huge import sets inside tikv or other real codebases. It requires
maintaining more mental context than necessary to read. I prefer only importing
items from a single module per import. These are simple examples but the nesting
can get much deeper. I don't see this style in the rust tree.
Naming


Crate names: kebab-case in Cargo and snake_case in Rust code

This is such a dumb pain-point in Rust. I think it's my fault for asking to be
able to use kebab-case. Today I think it is simpler and consistent to use snake
case for crate names. At least for libraries. For binaries kebab case makes more
sense, since binaries usually use kebab-case. So maybe kebab-case for all crates
is the simpler rule. It's such a frustrating experience to type in the wrong
casing to some search tool and not find it. It would be nice for Rust 2021 to
fix this papercut somehow.

When a name is forbidden because it is a reserved word (e.g., crate), use raw
identifiers (e.g., r#crate).

I disagree with this. It's super ugly, and I prefer a postfix underscore, which
I also think is more idiomatic in the code I read. Rationale?

Treat acronyms as words, e.g., GrpcType or grpc_variable, not GRPCType or
g_r_p_c_variable/GRPC_variable

Good, though I think this is an uphill battle. It's been part of the API
guidelines forever but people still use the GRPCType style, and I think more
commonly. And sometimes I find it more readable, particularly with short
acronyms.

If the constructor takes no arguments, prefer to implement Default instead.

I'm sympathetic but find this results in bad ergonomics since Default isn't in
the prelude. Further, I don't think it's that common to parameterize functions
over Default. So implementing Default is a premature abstraction in most cases,
and nothing but a less convenient convention than an inherent new method.
This rule might be more reasonable for libraries, in which case I would still
implement an empty new method.
There is also the consideration of whether an empty 'new' makes sense in the
presense of multiple constructors, or whether 'new' is the "base", "biggest"
constructor.
There's sometimes a distinction between types that should be empty-constructed
and those that are expected to be provided with arguments.

which customise values which take defaults in new

this hints that "new" should be the baseline constructor that takes all the
arguments but doesn't explicitly say it.
I think this ctor naming issue is pretty nuanced and am not convinced of a firm
set of rules that apply to all situations.

E.g., foo for the default variation (whether that is owned or by-ref), foo_ref
for a by-reference version, foo_move for a version which gives ownership, and
foo_mut for a by-mutable-reference version

I don't recall ever seeing foo_move. take_foo seems more idiomatic, though
the appeal of following the foo_* suffix pattern is obvious.
Comments


Wrap long lines of comments at 80 or 100 characters.

Seems like TiKV should pick one and enforce it with a tool.

https://doc.rust-lang.org/rustdoc/unstable-features.html#linking-to-items-by-type

TiKV seems to already use rust paths for internal links, per https://doc.rust-lang.org/rustdoc/unstable-features.html#linking-to-items-by-type,
which seems like a better practice than encoding HTML urls directly in the docs.
Modules


Put attributes (e.g., #[cfg(...)]) on a module declaration (mod foo; or
mod foo { ... }); do not use inner attributes (e.g., #![cfg(...)]).

This seems to say not to put "#!" attributes on files at all. I disagree with
this since it separates the file-module's implementation from its attributes.
The only attributes I would put on a mod foo; statement are cfg and path,
things that influence the loading or existence of the module at a particular
location.
And of course inner attributes have to be used on crates.

use re-exports (pub use) to expose API from the submodules.

I think this is overused in TiKV. Tracking down where types actually live is so
boring. I'm so conflicted about this pattern. I think maybe it's good for some
published crate APIs, but mostly obfuscating for internal-only crates.

Rationale: this worked badly for the Rust standard library, and some other
libraries.

I am not convinced by this argument in the general case or in the specific
case of std.

Where a module is used for a single impl and many names in that impl would
need to be imported, import only the module and use qualified names. A common
example is implementing the std::fmt::Debug or Display traits, import std::fmt
and use fmt::Formatter, fmt::Result, etc.

I'm skeptical of this as a general rule. It's common for fmt but I think
that's especially because it's commonly required to deal with fmts
name-clashing Result type.

To disambiguate names, prefer to use qualification, rather than renaming.
E.g., use fmt::Result rather than use std::fmt::Result as FmtResult.

This might help a lot in TiKV.

Never import enum variants, qualify use with the enum name (except for very,
very common variants such as Some and Err).

I think there are other exceptions to this, particularly when implementing
Display, Error, other times when I need to enumerate every variant, or when
writing the variants a lot in e.g. a table, I will *-import the variants into
function scope to avoid redundant noise.

Prefer to import a prelude module if available (using a * import).

I think preludes are an anti-pattern and never use them when provided. I want to
know exactly what's in scope. I'm even a bit skeptical of the std prelude.

Prefer to import macros rather than using extern crate and/or macro_import.

I agree with this, but I note that sometimes I find myself not being able to
use macros for reasons I don't understand. Particularly with things like
derive and other decorators. The rules are not obvious to me.
Crates


Every crate should have some crate-level documentation in their top-level
module and in a README.md.

README.md seems excessive. Do we really want that in every internal TiKV crate?

Use a Cargo workspace where there are multiple crates in one logical project.


If using a workspace, either have one top-level crate, or no to-level crates and one crate with the same name as the workspace.

Is the second recommending a "virtual workspace"? Personally I think TiKV should
try moving to a virtual workspace - at the moment there is no clear "top" crate,
but the main crate is the tikv crate, for historical reasons I think. A virtual
workspace would declutter the top-level manifest and make it clear that the tivk
crate is really just another crate in the middle of the dag.

If publishing a workspace crate, minimise the number of crates in the workspace.

The rationale is not obvious to me.

One-off actions should usually be OS scripts unless they need some Cargo functionality.

Not clear the definition of "one-off" action. It is though commonly necessary
for big projects to create a thin Makefile wrapper around cargo, or to add shell
scripts.

Export a minimal and coherent API. See the debuggability, dependability,
flexibility, interoperability, and predictability in the Rust API guidelines.

This reads pretty vague and inactionable to me, especially the word "coherent",
though maybe reading all the suggested links would give me a clear idea of what
it means.

Prefer to avoid unstable features.

Might want language about best practices for feature-gating unstable features in
published crates.

Procedural macro crates should be named 'foo-macros', where 'foo' is the main
crate they support.

What about "foo-derive"?

Small crates often improve compile times by improving parallelism. However,
this effect is often small and will be smaller in the future, so we shouldn't
prioritise it over good design.

I cringe at the dichotomy between crate decomposition and good design. I tend to
think that a well-factored crate dag is a good design and that monolithic crates
with lots of circular internal dependencies are an indication of unclear or
sloppy design considerations (also, circular dependencies are I think always
going to be bad for compilation performance). I know there's push-back about
small crates and facades in the Rust zeitgeist right now, but I'm not convinced.

Publishing workspace crates is frustrating.

This needs more explanation. I can't guess the meaning.

Security-sensitive dependencies should be properly audited.

More could be said about security-sensitive code, like prefer using crates which
are already established as best-in-class, don't write your own, audit the crate
feature flags for the most secure settings (e.g. asm vs rust impls).

Use the simplest version possible.

Needs more explanation.
It's not clear what this means, but I think it means, at least in part, to
prefer not specifying the point number.
Ignoring point numbers in manifests appears to be widely considered a best
practice, but one I strongly disagree with. I specify a point number of my
dependencies because that is the one I've validated. I assume more recent point
releases will still work correctly, but not that older point releases will -
previous point releases may have contained bugs that I will never see.

If you must use a Git dependency, always include a rev

Thank you.

Sections should be separated by a single blank line.

This is quite prescriptive. I don't have an opinion here, but I know in Markdown
my habit is to break up sections with 2 to 4 lines, depending on the document
structure and complexity, to make scanning easier.

Limit lines to 100 characters.

100 chars for toml files but 80 OR 100 for rust files?

Rationale: they can cause complications with complex dependency graphs, and
Cargo has some bugs with default features.

It can also cause downstream to pull in features they don't use, increasing
compile times.
Data structures


Prefer small structs. Build several small structs into one larger one.

This seems pretty situation-specific since the motivation is to make borrowing
internal fields easier. If I don't need any complex borrowing then it doesn't
matter.

Empty structs should usually be defined as struct Foo;, not struct Foo {} or struct Foo();.

Is there an exception?

If a struct is repr(C) then the order of the fields is significant. Usually,
the most efficient representation is to list the fields in descending order of
size. If a different ordering is required or more performant, document it.

This is kind of odd without context about why you would put repr(C) on a struct
in the first place, while also having a choice in the field layout.

Most newtypes and similar wrappers should be repr(transparent).

This is not obvious to me though I can imagine it being more efficient for
wrappers over scalar types in particular. In that case though, using a more
efficient ABI seems like something rustc should just do on its own.

Enums with many variants are fine.

This is a curious thing to state pre-emptively. I guess there's a common
perception that variants are bad?

Prefer using structs to tuples in types (e.g., of fields, return types, etc.).
I.e., most cases where tuples are good to use are where the type is anonymous
(e.g., in closures).

2-tuple return types are fine as long as they have two different types,
especially if the are usually immediately destructured, like with iterators. The
comment about closures isn't clear to me. Closure types are anonymous, but not
their arguments or return types. I don't see the connection between good-tuples
and closures.

It is recommended to use the builder pattern in preference to multiple
constructor functions.

I'm reluctant to recommend this, particularly for internal APIs like in TiKV.
Builders involve a lot of boilerplate and are most effective when there's a
large combination of possible options.
If there are a managable set of constructors that each have an obvious use case,
then they are easier to write, read, and call.
Builders are nice for "fancy" public APIs where readability is especially
desired.
They also have the obvious design tradeoffs in Rust between by-value and by-mut,
where neither is completely satisfying.
(Also, I don't remember where it comes from, but there's a notion that "patterns"
represent missing language features, and I think the builder pattern is the big
obvious example of this in Rust - so much ceremony, and the end result still
isn't great).

Prefer non-consuming builders (i.e., builder functions should take and return
&mut self, not self).

I disagree with this because I often find that it forces me to create an initial
local to root the reference, which hurts the "fluency" of the API. (Maybe this
is something that is overcome in current rust - I don't remember the exact problem).
Also, this lacks the nuance of addressing whether the final "build" function
should by by-mut or by-value, and managing invalidated builders.
Traits


Use the appropriate trait, don't use a trait beyond its intent.

This reminds me that tikv uses "{:?}" for weird production purposes, like
splatting errors into other errors. Could be worth calling it out specifically.

Implement Debug wherever possible.

Yes for published libraries, no for unpublished code. Yagni.

manual destructors should never be used

I disagree, though Drop needs to still be implemented as a fallback. There are
cases where you want the caller to explicitly destroy a type, particularly if
the destruction is fallable or blocking, e.g. a server shutdown. Explicit and
fallible destructors are on my Rust wishlist.

Drop implementations should not fail in other way

Not clear what this means. You've already said never panic. What other kind of
failure is there? Some failures are impossible to avoid, like failed I/O
flushes.

If a destructor might panic, fail, or block, provide a method which performs
the shutdown behaviour and which returns a result (or, if necessary, panic),
e.g., a close method. The user may then call the 'shutdown' method and check
for errors. Then the destructor is guaranteed to be well-behaved. This should
be documented.

Needs more guidance about how a destructor should behave here if the user
doesn't cal the shutdown method. Panic, or shutdown anyway?

Fn, FnMut, and FnOnce should be implemented by function-like objects which can
be called (e.g., callbacks).

I would just say never implement these traits. Is there an example where doing
so is a good idea?
Functions methods traits


Prefer to use generics and trait bounds (or impl Tr) rather than concrete types.

This is a pretty strong blanket statement. There are a few common patterns where
people tend to prefer generics, like with AsRef, but personally I'm happy with explicit
types and less generic bloat. APIs filled with clever generics are hard to read.
I particularly dislike the pattern that uses generics to accept either Option or T...

If a function is only used by methods of a single data type, it should be a static method of that data type.

It's not obvious to me what this means, but I think it's saying to write private
functions as static methods. I don't really agree with this, since I think it
clutters up the type - helper functions that don't have a receiver, but are
written as static methods require extra syntax to call and create an impression
of greater coupling than actually existse. Modules are the primary abstraction
boundary in Rust, so private standalone functions are more suitable for these
cases. I standalone function written after an impl block suitably deemphasizes
these kind of functions.

Generics are usually, but not always, better.

Citation would help.
Expressions


Avoid using let _ = to address an unused result warning. Prefer to use ? or unwrap.

What about expect? TiKV is filled with bad unwraps...

Prefer to use turbofish rather than type ascription

I was not even aware rust had type ascription yet! Seems like the feature is
still in flux.

Avoid pattern-matching on all fields in a struct - this is a backwards
compatibility hazard.

What's the hazard?
Errors


If a function can panic, that should be conveyed in the name of the function -
usually by including unwrap or assert.

Never seen this guideline before. Maybe word more like, if a function is
expected to panic.
Performance


Prefer to use push and push_str to build strings rather than the format macro.

This seems pretty untenable. format is too convenient and readable, and conveys
intent clearly. I don't want to open code a string builder every time I need to
compose a string.