Skip to content

Instantly share code, notes, and snippets.

@MatMoore
Last active June 10, 2021 18:58
Show Gist options
  • Save MatMoore/19248a69051e09dc3aa5100ca263f5e1 to your computer and use it in GitHub Desktop.
Save MatMoore/19248a69051e09dc3aa5100ca263f5e1 to your computer and use it in GitHub Desktop.
Code review

Questions

This is a big bit of script, what's the best way to break it up? perhaps have lots of different scripts that are thematically linked and import them?

I guess I would start by making the parts of the script less dependent on each other.

For example, it's usually a bad idea to use global variables, as when you have global variables they can be written to from anywhere in your code, so it will quickly become hard to understand and even if you seperate the code things in one part of the code can have far reaching effects. This includes any variables you read from within a function, but declare in the main body of the script. On the other hand anything that is only assigned to within a function will be local to that function so you know it can't be modified by anything else.

Some ways to deal with global variables:

Turn things into function parameters

e.g. instead of

def selectObject():
    
    global objectName

have

def selectObject(objectName):

Use return values to return data

For example, createObject returns data in two ways

  1. via the explicit return value
  2. by assigning to the global variable

This is redundant so you can just get rid of the global variable

Create objects that encapsulate related data

If you have a bunch of related functions that operate on similar data, you can create a class, e.g.

class Person:
   def __init__(self, name, date_of_birth):
      self._name = name
      self._date_of_birth = date_of_birth
   
   def greeting(self):
      return f'oh hi {self._name'

  def age(self):
    ....

This hides data and code from the code that uses it so its easier to reason about things. But be careful of just sticking everything in a class. Every class should have a single obvious responsibility and represent some concept within your application.

Isolate side effects

See https://en.wikipedia.org/wiki/Pure_function

Normally when you call a function you expect it to take some input and return some output. Functions like that are easy to reason about and you can move function calls around without breaking anything.

An example of a side effect would be reading input or reading/writing to the database. Code like that is non-deterministic and it makes a difference when you call the function, so it's going to be harder to reason about. If you're doing stuff like that from deep within a helper function then you also have to dig deeper into the code if you want to find where it happens (problematic for larger codebases)

It's good to seperate out things like IO into a seperate module or class. For example this is a thing I worked on last year where we had a seperate class responsible for reading/writing records to a db: https://github.com/andreagrandi/covid-api/blob/master/covidapi/services/crud.py

For database stuff SQLAlchemy is a good library that makes this easier

Modules and packages

If you are happy with how your code is structured in terms of classes and functions, then you can start splitting things into files.

You could start with just having modules for any things that feel quite self contained.

Sometimes people organise things into layers (where lower layers know less about their context and higher layers know less about the details) or based around how the domain is modelled.

https://docs.python-guide.org/writing/structure/ has some tips.

Am I supposed to be using init or something anywhere to kick of the script?

There are different ways to do it.

If you write if __name__ == '__main__:' in your code then everything nested in that if will only execute if you run it directly with python yourscript.py

Otherwise, if you import it as a module from another script, all your code will execute automatically, which is usually not what you want.

There are other ways to kick things off though. If you are using a framework to structure an application then there may be different ways of kicking it off. Since you are using a lot of inputs, I would recommend looking into a command line library like click (see https://medium.com/geekculture/build-interactive-cli-tools-in-python-47303c50d75)

If you follow the examples it should also help you seperate the user interface from the rest of the app.

Some of my logic is broken around selecting objects. Any advice or suggestions welcomed.

How are you testing it at the moment?

You might find it easier if you can set up some kind of test harness where you can quickly test small functions, e.g. with pytest

For testing database stuff you should have some way of getting back to a clean state after each test run. This can be done by creating the db from scratch each time, or running the queries inside a transaction and rolling back at the end of the test. With sqlite I guess you could also use an in memory db for testing or make a copy of the database file from one in a known state.

You should also be wary of catching all exceptions like this, as it can mask other errors:

    try:
        print(data[0])
        
    except:
        print('no object found')

You should instead have something like except KeyError or some specific kind of error you want to catch. These are all the built in exceptions.

If you stick breakpoint() calls in your code then you can use the debugger to see what values are set to at any point in the program. Or you can add print statements to log out what it's doing.

General comments

Old files

When using git it's redundant to version things using filenames, since every version is stored for you. You can just delete the old ones and you will be able to get back to old code if you need it. Editing files rather than creating new ones will makes it easier to go back and see you what did. If you commit every time you make a small change then you will be able to see the diff of each change like this, or the diff between any two versions of the code like this. There's an example here of how to do inspect changes on the command line as well https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History

Code style

It's worth looking at some kind of tooling that looks for easily detectable errors and enforces coding style. Some examples for python are https://pypi.org/project/pyflakes/ or https://pypi.org/project/black/

You can get plugins for IDEs/editors so these will just run automatically. Some benefits:

  • they can catch typos and small errors before you run the code
  • they will warn you about coding things that are error prone or not idiomatic
  • they will force you to use consistent naming conventions, whitespace etc

If you are the only one working on the code it doesn't really matter what style you follow, but if you want other developers to work on later then being and consistent will help them not hate you.

For example if you mix camel case and underscores for naming variables, then it's easier to make typos in the variable name.

The standard style for python is pep8: https://pep8.org/ If you configure your editor right then it will basically show you where you are diverging from conventions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment