Skip to content

Instantly share code, notes, and snippets.

@lmmx
Created February 9, 2025 11:44
Show Gist options
  • Save lmmx/f028cd187171b8a59bf1ad53c0f44398 to your computer and use it in GitHub Desktop.
Save lmmx/f028cd187171b8a59bf1ad53c0f44398 to your computer and use it in GitHub Desktop.
Advice to avoid overzealous comment deletion/variable renaming in scientific software

Below is a detailed analysis of why these changes can be problematic and how we can modify our heuristics (the “rules” we follow when editing code) to avoid deleting or renaming code elements that carry important meaning:


1. Why the changes can be problematic

a. Removing helpful comments loses context

  • Original comments:

    # [px0,px1,px1,px0] for first bit of p00,p10,p11,p01
    # [py0,py0,py1,py1] for second bit of p00,p10,p11,p01

    These lines explain how the arrays [px0, px1, px1, px0] and [py0, py0, py1, py1] correspond to corner coordinates in the order (p00, p10, p11, p01). That’s a subtle detail which isn’t obvious just by looking at np.pad(...) or np.repeat(...).

  • Why it matters:
    Anyone reading the code in the future would have to re-derive or guess how these arrays map to corners. By deleting these comments, we sacrifice clarity. Comments like these are especially crucial when the code is “weird” or tricky—there’s a reason the developer wrote them.

b. Renaming variables to “shorter” versions isn’t always better

  • Original:

    px_coords, py_coords = np.dot(pts, np.transpose([x_dir, y_dir])).T
    xcoords.append(px_coords - px0)
    ycoords.append(py_coords.mean() - py0)
  • Changed:

    px_c, py_c = np.dot(pts, np.transpose([x_dir, y_dir])).T
    xcoords.append(px_c - px0)
    ycoords.append(py_c.mean() - py0)

    While shorter names like px_c and py_c might seem harmless, they’re also less explicit than px_coords and py_coords. The original names emphasize these are coordinate arrays, not just scalars.

  • Why it matters:
    Variable names are part of code’s “self-documentation.” Renaming them to be shorter or more uniform sometimes makes sense (e.g., if the variable is used once, or is very obviously local), but in this case, “px_coords” is quite descriptive: it says “this is an array of x-coordinates.” px_c loses that clarity.

c. These edits give no performance benefit

  • Deleting these comments or renaming variables doesn’t optimize code speed or memory usage. The cost is purely in clarity. So from a cost/benefit standpoint, it’s a net negative unless there’s an explicit style mandate or repeated feedback that the code is too verbose.

d. Overzealous cleanup can obscure “weird” or domain-specific code

  • In domain-specific transformations (like assembling corner coordinates in a certain order), unusual array manipulations often require extra commentary.
  • If a linter or code review suggests removing comments or renaming variables, it might be applying a generic approach (“less code is simpler” or “rename everything consistently!”) when in fact these lines are documenting specialized math or geometry logic. You end up losing domain insight.

2. How to modify your editing heuristics

  1. Distinguish docstring style from inline code clarity

    • Tools like pydocstyle (via Ruff) focus on docstrings (the “what/why” of modules, classes, and functions).
    • Inline comments that explain tricky array operations or transformations don’t typically conflict with pydocstyle. So removing them “to fix docstring errors” is misguided.
  2. When encountering an inline comment, ask “does this clarify a non-obvious step?”

    • If yes, keep it (or even improve it). Don’t remove or shorten it.
    • If the comment is obviously outdated or duplicative of the docstring, maybe it can be removed—but that’s rare for specialized math or geometry code.
  3. Retain domain-specific variable names

    • If you have px_coords, py_coords, x_dir, y_dir in geometry code, that’s probably meaningful.
    • Heuristic: “Does the rename actually reduce confusion?” If not, keep it as-is.
  4. Keep or improve “weird” indexing comments

    • Comments explaining [px0, px1, px1, px0] or [py0, py0, py1, py1] are crucial for future maintainers.
    • If a comment is too terse or cryptic, you can rewrite it for clarity, but never remove the logic it’s describing.
  5. Don’t assume shorter = better

    • If the code or comment references an obscure formula or transformation, more detail is often helpful.
    • If a variable name like px_coords is slightly longer, but self-explanatory, that is usually a good trade-off.
  6. Balance minimal changes with new docstring requirements

    • The linter wants docstrings on functions/classes (D100, D103, etc.). That’s separate from how you comment your code.
    • Satisfy docstring rules without removing valuable inline commentary or rewriting variable names unnecessarily.

Summary

  • Why you find these changes inappropriate: They delete or rename essential commentary and variable names that convey critical context, which is not required by any docstring or style rule. That results in a net loss of clarity and no real benefit.
  • How to fix: Modify your heuristics so you focus on adding/fixing docstrings (to appease pydocstyle), but do not remove inline comments that explain specialized logic or rename meaningful variables to shorter forms unless there is a compelling reason (e.g., they truly cause confusion).

In short, keep your docstrings and your inline code commentary both: docstrings give high-level explanation of a function’s purpose, while inline comments/variable names illustrate the local steps or domain-specific transformations. That synergy is what makes the code maintainable and comprehensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment