Skip to content

Instantly share code, notes, and snippets.

@18alantom
Last active July 15, 2024 06:12
Show Gist options
  • Save 18alantom/d9f0565c0f42d6a71311d4a3093a1331 to your computer and use it in GitHub Desktop.
Save 18alantom/d9f0565c0f42d6a71311d4a3093a1331 to your computer and use it in GitHub Desktop.
A document on code quality and readability based off of the bad practices I've encounted in the last few weeks.

I often find myself getting frustrated with code—be it code written by me or by others. This, I find, is because that particular code disregards readability.

This frustration of mine often bubbles up in the form of colorfully disparaging commit messages. I decided that it would be more productive to instead type-down what I feel are some decent coding practices.

I've added examples where applicable and tried to explain the why behind the practices.

This document is prone to recency bias. The practices here come to mind because of code that I've encountered (in the last handful of weeks) that does the opposite.

Index:


Why prioritize readability

When your code is readable, the intent behind why you wrote what you did is easy to gauge by someone who is reading your code. It has to do with comprehension. The crux of readable code is to express your intent in the simplest way possible.

If your code is not readable it is incomprehensible.

This ends up being a problem because when someone wants to add to your code they have to make assumptions about what you intended.

These assumptions might not be right.

Due to this, your code ends up becoming a slow but steady generator of bugs.

Which is why readable code has the second order effect of being robust. It does not break in mysterious ways when people find the need to extend it.

For the listophiles, here's a list or reasons:

  • Readable code is easy to understand
  • Readable code makes people hate you less
  • Readable code doesn't make people question your programming literacy
  • Unreadable code remains like an ugly mole in your codebase that no one wants to touch.
  • Readable code is less prone to being refactored
  • Readable code makes people respect you more
  • Readable code is easier to extend
  • Unreadable code is tech-debt

Functions

Regular function names

Function names should be verb followed by a noun:

# Bad
def process():
    ...

# Good
def start_process():
    ...

Functions that return something should start with get_ followed by what they return:

# Bad
def apps() -> list[Apps]:
    ...

# Good
def get_apps() -> list[Apps]:
    ...

Similarly functions the update something should start with set_ followed by the name of what they update:

# Bad
def apps(apps: list[Apps]):
    ...

# Good
def set_apps(apps: list[Apps]):
    ...

Why?

The verb (eg: start) tells you what the function is going to do. The noun (eg: process) gives you context of the functions action.

Method names

Method names can just be the verb:

class Process:
    def start():
        ...

Same kind of naming can apply to namespaced functions if they are addressed by their namespace:

from utils import process

process.start()

Why?

Since the method is called by the object name (eg: process.start()), the context is already present.

Predicate function names

Predicate functions are those that return a boolean value. The name of such a function should be phrased as a yes-no question:

# Bad
def even(i: int) -> bool:
    ...

# Good
def is_even(i: int) -> bool:
    ...

Conversely, a function that is phrased as a yes-no question should return a boolean value (example of comically bad violation).

Why?

Better documentation; it tells us that the function is predicate function i.e. it tests the state of some input and returns a boolean value.

Function length

Your functions length should be constrained by the implications of its name. I.e. a function should do exactly what the function's name implies.

# Bad
def is_even(i: int) -> bool:
    if i % 2 == 0:
        update_db() # This is an unexpected side effect
        return True
    return False

def process_val(val: int):
    if is_even(val):
        ...
# Good
def is_even(i: int) -> bool:
    # No unexpected side effects
    return i % 2 == 0

def process_val(val: int):
    if is_even(val):
        update_db() 
        ...

Rule of thumb: your function is too long if:

  1. The name cannot exactly describe what it does.
  2. There are too many nested blocks.

Why?

By doing this the function's name doubles as accurate documentation . One need not dive into the function's body to figure out what it does.

Note: don't focus on the length of your functions. Premature shortening on functions leads to clever code. Clever code is an oxymoron as it's often code that is written in a flash of stupidity, or laziness, or at times even due to deep seated insecurity.

Code blocks

A code block is any section of code that can have it's own scope, examples:

def do_thing():
    # This is a block
    ...

def do_another_thing():
    # This too is a block
    if conditional:
        # This is a block inside another block, i.e. a nested block
        ...

General rule of thumb: avoid nesting more than a single level. This section mostly consists of practices on how to reduce nesting while maintaining readability.

Example of alarmingly deep nesting.

Refactor nested blocks as functions

If you find that your code has several levels of nesting, an inner code block can be refactored into a function:

# Bad
def validate_benches(self):
    for bench in self.benches:
        for app in bench.apps:
            if not is_app_valid(app):
                raise Exception("invalid app")
# Good
def validate_benches(self):
    for bench in self.benches:
        validate_apps(bench.apps)

def validate_apps(apps: list[App]):
    if is_app_valid(app):
        return
    raise Exception("invalid app")

Why?

Refactoring nested blocks into functions causes the inner block functionality to be documented by the function's name.

Use early exit

An early exit is when a function checks args and returns on receiving invalid ones:

# Bad
def add_app(team: Team, app_name: str, bench_name: str):
    if team.enabled:
        app = get_app(app_name)
    if app.team == team.name:
        bench = get_bench(bench_name)
        bench.add_app(app)

# Good
def add_app(team: Team, app_name: str, bench_name: str):
    if not team.enabled:
        return

    app = get_app(app_name)
    if app.team != team.name:
        return

    bench = get_bench(bench_name)
    bench.add_app(app)

This applies to any code block whose evaluation can be terminated early on receiving invalid values:

for value in some_list:
    if is_invalid(value):
        continue

    process_value(value)

Why?

It's easier to mentally parse and evaluate the state of a codeblock after the invalid cases have been filtered out by early returns. Here's a StackOverflow entry with several good points.

Note: the above example also illustrates why you shouldn't blindly optimize for function length. The second code block is longer but more readable.

Usage of one-liner blocks

This applies more to languages where code blocks are delimited by braces. Here's an example of what I mean by a one-liner block:

// One-liner
for (const value of list) evaluate(value);

// Regular
for (const value of list) {
  evaluate(value);
}

One-liners are alright if what's being done is simple, but for complex cases it's better to write out the entire block:

// Bad
for (const value of list) !isInvalid(value) && evaluate(value);

// Good
for (const value of list) {
  if (isInvalid(value)) continue;

  evaluate(value);
}

This also applies to higher order function calls:

// Bad
list.forEach((a) => !isInvalid(value) && evaluate(value));

// Good
for (const value of list) {
  if (isInvalid(value)) continue;

  evaluate(value);
}

Why?

Besides eschewing clever code, it's more comfortable to scan text vertically than horizontally.

Which is why constraining columns to 80 characters makes sense and webpages don't use the entire width of the screen to display text.

Avoid globals

Don't use global variables unless they are constants. An example of what not to do:

// Variables below are in the global scope.
let prev_assets_json;
let curr_assets_json;

async function write_assets_json(metafile) {
  // Function body that alters global variables.
}

Only constants, functions, and other static components of your code should be allowed in the global scope.

Even if you are writing a script, wrap everything in function calls and only have the main function call in the global scope:

# Bad script.py
while True:
    input_value = read_input()
    output_value = process(input_value)
    print(output_value)
# Good script.py
def main():
    while True:
        input_value = read_input()
        output_value = process(input_value)
        print(output_value)

main()

Use a class if you want to share state between function calls but don't want to pass it around as args.

# Bad script.py
input_value = ""
output_value = ""

while True:
    read_input()
    process_output()
    print(output_value)

def read_input():
    input_value = input(...)

def process_output():
    output_value = process(input_value)
# Good script.py
class App:
    input_value: str = ""
    output_value: str = ""

    def read_input(self):
        self.input_value = input(...)

    def process_output(self):
        self.output_value = process(self.input_value)

    def run(self):
        while True:
            self.read_input()
            self.process_output()
            print(self.output_value)

App().run()

Why?

In general global variables are hard to reason about. State of a global variable is not apparent at a glance. In the absence of types, even the type of a global variable is not apparent.

Here's a blog post that dives into why global variables are considered a bad pattern.

Miscellaneous

Handle all cases

I have seen a lot of code that looks something like this (eg):

def do_something(value):
    if value is not None:
        return do_something_with_value(value)

The issue here might not be obvious:

def do_something(value):
    if value is not None:
        return do_something_with_value(value)
    # What happens in this case?

But it's one that leads to undefined behavior. This leads non-robust code that breaks infrequently but in a way that is not easy to debug.

Hence, the crux of this is to handle all cases. When you do so you don't leave room for undefined behavior:

def do_something(value):
    if value is None:
        raise Exception("Cannot do something with nothing") # Don't be witty with exceptions
    return do_something_with_value(value)

And now you know that if an invalid value is passed the function will error out. Another way to reduce undefined behavior is to use types.

Use const when applicable

If you're using JavaScript/TypeScript (or any language with const) and your variable's value is not going to tbe changed then use const as an initializer.

This seems pretty obvious, but I'd asked one of our "top" developers why they use let everywhere instead of const even when the variable is not being re-declared, they told me that it's cause let is shorter 🤡.

Here's an eslint rule to enforce this.

Why?

Using const for variables meant to be constants prevents someone from re-declaring it because the execution will error-out.

It also documents that the variable is not meant to be changed.

Prefer async-await over callbacks

While async-await suffer from function coloring the tradeoff is worth it when compared to callbacks.

Which have (at least) these problems:

  • State mutation in the callback causes final state to not be apparent.
  • Aesthetically abhorrent.
// Bad
this.value = 'a';
doThingOne().then((value) => {
  this.value = value;
}); // "b"
doThingTwo().then((value) => {
  this.value = value;
}); // "c"
this.value === 'c'; // true or false?

// Good
this.value = 'a';
this.value = await doThingOne(); // "b"
this.value = await doThingTwo(); // "c"
this.value === 'c'; // true

Of course in the above example, the two calls aren't running asynchronously. If that is the requirement, could use something like Promise.all

this.value = 'a';
const values = await Promise.all([doThingOne(), doThingTwo()]);
this.value = values.at(-1);
this.value === 'c'; // true

By doing so you restrict the region in which asynchronous code runs. Awaiting the Promise.all call terminates the asynchrony. But when you use use a callback asynchronous execution is not bounded in a way that is apparent from the code itself.

Using esoteric features

Esoteric is almost a negative epithet for less used features of a language. Sometimes you need to use these features. More often than not, you don't. For instance consider the following code:

async function* fetchData(urls) {
  for (const url in urls) {
    return await fetch(url);
  }
}

for await (const response of await fetchData(urls)) {
  processResponse(response);
}

It's an async generator, using it allows you to lazy evaluate asynchronous tasks. This allows for pipelining operations to process one piece of data at a time. This could of-course be rewritten like so:

for (const url in urls) {
  const response = await fetch(url);
  processResponse(response);
}

And so for most cases it doesn't make sense to use it, because most developers aren't familiar with it.

But there might be a case wherein you are not in control of how the data is processed, i.e. that code isn't written by you. It might be written by someone else who is calling your code. In such a situation using an async generator may be apt.

If you do find your self in this situation, you may use such a feature, but since it it's unfamiliar territory for most, document it. Add a comment explaining what is going on.

Typing

Setting up and using types for a codebase (using a dynamically-typed language) is one of the biggest wins w.r.t code documentation and developer experience.

Note

This section refers to the usage of TypeScript and Python typing. If you do not wish to use TypeScript, types may be added to regular JavaScript by use of JSDoc.

Why use types

Types greatly reduce ambiguity w.r.t what state some code is in. Consider this simple example:

function handleStatusChange(status) {
  if (status === 'Draft') {
    return;
  }

  updateStatus(status);
}

The function takes a value of an unknown type, and so all invocations of it are valid:

// All valid invocations
handleStatusChange(null);
handleStatusChange(undefined);
handleStatusChange(NaN);
handleStatusChange('Not-a-status');

Due to this, even though you have guarded the function by checking for an unwanted value "Draft", rest of the invalid values from all valid invocations get through.

The variable now has an infinite number of possible states that you as a developer have to handle:

any

Sure you can write defensive code, such as this:

function handleStatusChange(status) {
  if (status === 'Draft') {
    return;
  }

  if (status === 'Failure' || status === 'Success') {
    updateStatus(status);
  }

  // throw error? return? take your pick
}

But how often will you resort to this, is every function you write going to be preambled with type guards and validations?

Instead you can define the possible status using types:

type Status = 'Draft' | 'Success' | 'Failure';

function handleStatusChange(status: Status) {
  if (status === 'Draft') {
    return;
  }

  updateStatus(status);
}

All the previously valid invocations get invalidated by the type checker:

invalidated

The number of possible states that your variable can be in reduces from infinity to 2:

what-type

While this might be the crux of "why types?", it just scratches the surface of the benefits of using them.

Coming back to the theme of readability, another point to consider is that types are almost like tightly coupled documentation.

From the previous example, one doesn't have to leave a comment mentioning the values that status can be it the type definition suffices.

Note: the above example stands for Python too (here unlike for JavaScript it isn't a separate language):

pytypes

Where to start

It can be daunting to add types to an old project, I've found the best way to go about it is to start incrementally, first with new code that doesn't use untyped inputs, then refactoring the simpler parts of the codebase to use types, then the more complex parts.

Code without untyped inputs

What I mean by this, since types are contagious, if you are using untyped code with typed code you end up with untyped code. Consider the following code for example:

typed-untyped

Here the function update_status has a typed signature, but the calls are made to untyped functions get_old_status and set_new_status and their return values are passed to the caller.

This might not seem like a problem, but if the definition of either of these functions change at some point in the future, for example:

def set_new_status(app_name, new_status):
    App(app_name).status = new_status
    return

Then the caller of update_status gets an incorrect type:

bad-typing

The function signature says bool, and so the caller expects a bool but what they get instead is a None.

Enable inlay hints

One way to be made aware of this is to enable inlay hints (ref). This gives you type information even if it is not explicitly mentioned:

inlay-hints

The above example is with all inlay hints enabled for PyLance, you can configure it to be less verbose too.

UI Code

UI Code, specifically the HTML parts tend to be the ugliest and least readable.

Note

The examples below consists of markup found in the template section of Vue.js single file components. That being said, the first section applies to regular regular HTML code too.

Semantic elements and structuring markup

A few things can be done to make this code more structured and readable:

  1. Use semantic elements:
  • Using <div/> for everything renders the code a homogeneous incomprehensible blob.
  • Using semantic elements document what a node (the element and it's children) is meant to be.
  1. Section components in a file:
  • Use blank lines and comments for this.
  • Makes it easy to parse and differentiate between different components.
  • Makes it easy to associate what's on the screen with what's in the code editor.

This is better explained with an example. Here's part of a Vue single file component, with a few sub-components within them:

<template #body-content>
  <div class="space-y-4">
    <div v-if="step === 'select-apps'">
      <div class="mb-4 text-lg font-medium">Select apps to update</div>
      <GenericList
        v-if="benchDocResource.doc.deploy_information.update_available"
        :options="updatableAppOptions"
      />
      <div v-else class="text-center text-base text-gray-600">
        No apps to update
      </div>
    </div>
    <div v-else-if="step === 'removed-apps'">
      <div class="mb-4 text-lg font-medium">These apps will be removed</div>
      <GenericList :options="removedAppOptions" />
    </div>
    <div v-else-if="step === 'select-sites'">
      <div class="mb-4 text-lg font-medium">Select sites to update</div>
      <GenericList
        v-if="benchDocResource.doc.deploy_information.sites.length"
        :options="siteOptions"
      />
      <div
        class="text-center text-base font-medium text-gray-600"
        v-else-if="!benchDocResource.doc.deploy_information.sites.length"
      >
        No active sites to update
      </div>
    </div>
    <ErrorMessage :message="$resources.deploy.error" />
  </div>
</template>

Unless you read the code, it's hard to distinguish the boundaries of the sub-components. After refactoring it, it quickly becomes apparent:

<template #body-content>
  <div class="space-y-4">
    <!-- Select apps step -->
    <div v-if="step === 'select-apps'">
      <h2 class="mb-4 text-lg font-medium">Select apps to update</h2>
      <GenericList
        v-if="benchDocResource.doc.deploy_information.update_available"
        :options="updatableAppOptions"
      />
      <p v-else class="text-center text-base text-gray-600">
        No apps to update
      </p>
    </div>

    <!-- Remove apps step -->
    <div v-else-if="step === 'removed-apps'">
      <h2 class="mb-4 text-lg font-medium">These apps will be removed</h2>
      <GenericList :options="removedAppOptions" />
    </div>

    <!-- Select site step -->
    <div v-else-if="step === 'select-sites'">
      <h2 class="mb-4 text-lg font-medium">Select sites to update</h2>
      <GenericList
        v-if="benchDocResource.doc.deploy_information.sites.length"
        :options="siteOptions"
      />
      <p
        class="text-center text-base font-medium text-gray-600"
        v-else-if="!benchDocResource.doc.deploy_information.sites.length"
      >
        No active sites to update
      </p>
    </div>

    <ErrorMessage :message="$resources.deploy.error" />
  </div>
</template>

In the above example, even though sub-components had titles they were hard to distinguish from the rest of the markup because:

  1. It used a div as everything else
  2. It is highlighted as everything else
  3. There wasn't any spacing between the sub-components. Adding comments to space out your sub-components makes it's differentiated syntax highlighting work in the favour of readability.

Note: the above example has been lightly abbreviated, here's actual example: before, after. Here refactor wasn't the only change, additional code was added too.

Usage of inline logic

Vue (and other frontend frameworks) allows one to embed logic inside markup. This is powerful, but also very easy to get carried away with (rem). Here's an example src of what not to do:

<button
  v-if="
    step !== 'select-apps' &&
    !(
      step === 'removed-apps' &&
      !benchDocResource.doc.deploy_information.apps.some(
        app => app.update_available === true
      )
    )
  "
  label="Back"
  @click="
    benchDocResource.doc.deploy_information.removed_apps.length &&
    step === 'select-sites'
      ? (step = 'removed-apps')
      : (step = 'select-apps');
    selectedApps = new Set();
  "
/>

In the above example, rendering logic (v-if) and onclick handler logic (@click) have both been embedded in the markup. I don't need to tell how abhorrent this is, you can just look at it.

Here the rendering logic should be a computed value and the onclick handler should be a method (src):

<button v-if="canShowBack" label="Back" @click="back" />

Rule of thumb: if your inline logic can't fit in a single line, it shouldn't be in the markup.

@barredterra
Copy link

barredterra commented Jul 8, 2024

Hi, thanks for the great points! I found a few things while reading through this that could use some tweaking. The examples in the "Function length" and "Refactor nested blocks as functions" sections don't seem to be quite right. For the typing example, it might be helpful to clarify that this is referring to TypeScript, not regular JavaScript. Also, it'd be good to note that the HTML is specific to "Vue," not regular HTML. Maybe the HTML examples could use vanilla HTML instead, while for the Vue-specific HTML examples you could point out that these are only relevant for Vue users.

@18alantom
Copy link
Author

Thanks Raffael! I've updated those sections and added a couple of notes addressing your points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment