Skip to content

Instantly share code, notes, and snippets.

@julien-h
Created December 9, 2019 10:02
Show Gist options
  • Save julien-h/6df8951d0c4da5c8720969f5d6c0658b to your computer and use it in GitHub Desktop.
Save julien-h/6df8951d0c4da5c8720969f5d6c0658b to your computer and use it in GitHub Desktop.
Preliminary report for my semester project about the refactoring of `deepfly`, a python application.
title category date tags summary
Refactoring a graphical application: DeepFly3D
technical blog
2019.12.03
python

Abstract

Deepfly is a python application used to conduct research at EPFL's Ramdya Lab. Coded over a short period of time, the application's python codebase accumulated technical debt that made it hard to implement new features. This report is a summary of my work on Deepfly during the fall semester 2019: (1) how I reduced the technical-debt by rewriting part of the code to improve its overall design and readability, a process commonly known as code refactoring, (2) how I implemented a new command-line based user-interface to interact with the application and (3) various minor improvements to the software's features.

Introduction

Deepfly is a python application created by Semih Gunel to perform 2D & 3D tethered drosophila pose estimation. From multiple videos of a drosophila, taken at different angles, the application first estimates the position of the drosophila's legs on each 2D-video before creating a 3D-model.

At its core, Deepfly uses deep-learning to perform the pose estimation. The application was built with a Graphical User Interface (GUI) that: (1) runs the pose estimation, (2) displays the results to the end user and (3) allows the user to correct manually the estimated positions.

Deepfly's code was released under the GPL-v3 licence and is available on Github [1]. Below is a screen capture of its GUI.

DeepFly's GUI

The application was coded over a short period of time with an emphasis on functionality but not code layout. As a result, there is a lot of duplicated code throughout the codebase, an no clear boundary between the GUI code and the core estimation logic.

Code duplication makes implementing new features or changing existing features hard because for each feature change, the code has to be edited in multiple places. This creates more work than required and is also error-prone, since oftentimes one of the places is forgotten, yielding a mismatch between parts of the code that should have been the same, and therefore a bug.

Since the pose estimation is done directly in the GUI code, implementing another interface would require either to duplicate the pose estimation logic, or to refactor the GUI to be able to share the pose estimation code between the two interfaces.

This report is a summary of my work on Deepfly. In the next sections, I describe how I simplified the internal layout of the code by flattening the file hierarchy and refactoring the GUI's code. Then, I describe new features that I implemented in the GUI and a new command-line based interface.

Referenced in this section:

[1] https://github.com/NeLy-EPFL/DeepFly3D

Table of content

[TOC]

Who do we write code for?

Most of my work on Deepfly pertained to simplifying its layout and code. As such, I have "alter[ed] the internal layout of software without making the changes in the external behavior" [1].

The guiding principles for these modifications was to make the code simpler for humans to grasp and understand, and as such they are all rooted in the way we understand, process and compress informations.

As the author of Python, Guido van Rossum said in his talk at PyCon 2016 [2]:

"Typically when you ask a programmer to explain to a lay person what a programming language is, they will say that it is how you tell a computer what to do. But if that was all, why would they be so passionate about programming languages when they talk among themselves?

In reality, programming languages are how programmers express and communicate ideas — and the audience for those ideas is other programmers, not computers."

— Guido van Rossum, creator of Python, talk at PyCon 2016

Code structure and layout makes (almost) no difference for the computer executing the code, but makes a huge difference for humans reading and attempting to edit it.

Failure to keep the code tidy and easy to understand for humans is often part of what is called technical debt in software development, explained as follows on Wikipedia: "As a change is started on a codebase, there is often the need to make other coordinated changes at the same time in other parts of the codebase or documentation. Required changes that are not completed are considered debt that must be paid at some point in the future." [3]

In those terms, my work on Deepfly was primarily to reduce its technical debt.

Referenced in this section:

[1] The definition of "refactoring" by Martin Fowler, in his book: Refactoring: Improving the Design of Existing Code, 2nd ed., Addison-Wesley Signature Series (Fowler), 2018.

[2] Guido Van Rossum, Python Language, talk at PyCon 2016. Available on Youtube.

[3] Technical Debt, article on Wikipedia, https://en.wikipedia.org/wiki/Technical_debt

A flatter file hierarchy

Deepfly's file hierarchy contains multiple sub-packages and sub-sub-packages. Because there is also no clear decoupling between the GUI code and the core pose estimation logic, this makes looking for a file tedious: are the optimization files in the package GUI.util or pose2d.utils? Contrary to my intuition, they are in GUI.util.

Nested hierarchies have several drawbacks, including the following non-exhaustive list:

  1. They introduce programmer overhead to (a) remember where a file is located, and (b) choose the proper sub-package to add new files. As is the case in the current version of Deepfly, when a file is placed in the wrong sub-package, this creates confusion and wastes reader's time.

  2. They make relative import statement tedious to write. One has to keep in mind the files hierarchy to be able to import a module. For instance, should I write from . import utils or from .. import utils? The problem is even worse if there's an utils module in both the current and parent directories.

  3. Nested hierarchy tend to be reorganized, which makes all import statement fragile, since each reorganization requires to change all import statements.

  4. Each sub-folder requires an __init__.py file. These files are often empty, but programmers can't be sure until they open and read them. This wastes programmers time.

  5. Users of the deepfly package have to know the internal layout of the package to be able to import its functions. For instance, rather than:

    from deepfly.pose3d.utils.procrustes import procrustes_seperate

    It is easier to write:

    from deepfly.procrustes import procrustes_seperate

    Or even:

    from deepfly import procrustes_seperate

A flat files hierarchy avoids these drawbacks: programmers know where to look for modules (there is only one or two folders), the import statements are all relative to the root package, and finding a file using modern search functions is even easier when there is no nested folders.

Traditional python style guides favor flatter hierarchies, as can be read in The Zen of Python [1]: "Flat is better than nested". And as python's core developer Raymond Hettinger said: "Namespaces are for avoiding name clashes, not creating taxonomies" [2].

Flat is better than nested. — The Zen of Python

Jack Diederich also gave a fascinating talk at PyCon US 2012, named Stop Writing Classes where he advocates fewer classes, fewer files and fewer packages to save programmers' time. His main argument is that we should write the least possible amount of code and put the least possible amount of structure in our software, which reminds me of the design principle KISS: "keep it simple, stupid", famous in the software engineering sphere.

I have thus decided to flatten Deepfly's file hierarchy by putting all modules in the root package. Deepfly's current hierarchy is described below:

deepfly
├── GUI
│   ├── BP.py
│   ├── CameraNetwork.py
│   ├── Camera.py
│   ├── Config.py
│   ├── DB.py
│   ├── __init__.py
│   ├── main.py
│   ├── skeleton
│   │   ├── __init__.py
│   │   ├── skeleton_fly.py
│   │   └── skeleton_h36m.py
│   ├── State.py
│   └── util
│       ├── ba_util.py
│       ├── cv_util.py
│       ├── im_util.py
│       ├── main_util.py
│       ├── optim_util.py
│       ├── os_util.py
│       ├── plot_util.py
│       └── signal_util.py
├── __init__.py
├── pose2d
│   ├── ... 
│   └── progress
│       ├── demo.gif
│       ├── LICENSE
│       ├── MANIFEST.in
│       ├── progress
│       │   ├── bar.py
│       │   ├── counter.py
│       │   ├── helpers.py
│       │   ├── __init__.py
│       │   └── spinner.py
│       ├── README.rst
│       ├── setup.py
│       └── test_progress.py
└── pose3d
    ├── __init__.py
    └── procrustes
        ├── __init__.py
        └── procrustes.py

11 directories, 49 files

The new file hierarchy is:

deepfly
├── ba_util.py
├── BP.py
├── CameraNetwork.py
├── Camera.py
├── cli.py
├── Config.py
├── core.py
├── cv_util.py
├── DB.py
├── gui.py
├── im_util.py
├── __init__.py
├── logger.py
├── optim_util.py
├── os_util.py
├── plot_util.py
├── pose2d
│   └── ...
├── procrustes.py
├── signal_util.py
├── skeleton_fly.py
├── skeleton_h36m.py
├── utils_ramdya_lab.py
└── video.py

4 directories, 37 files

Due to lack of time, I haven't flattened the pose2d submodule. I have also kept the original modules' names, but a better naming convention is required for the utility modules. For instance, cv_utils.py should be named using full words, such as computer_vision.py. This would allow more meaningful import statements such as from deepfly import computer_vision as cv rather than from deepfly import cv_utils as cv

With more time, I would make the file hierarchy even flatter, maybe along the lines of:

deepfly
├── cli.py
├── gui.py
├── core.py
├── utils.py
└── vendor
    ├── external module 1
    └── external module 2

Referenced in this section:

[1] Tim Peters, The Zen of Python, PEP 20, https://www.python.org/dev/peps/pep-0020/

[2] Raymond Hettinger in Transforming Code into Beautiful, Idiomatic Python, PyCon US 2013. Available on Youtube.

[3] Jake Diederich, Stop Writing Classes, PyCon US 2012. Available on Youtube.

Refactoring of the GUI

Before the refactoring, Deepfly's GUI was coded using the following classes:

  • the class DrosophAnnot was the main Qt window (it inherited from QWidget) ;
  • the class ImagePose was a sub-widget to display an image (since 6 images are displayed on the main window, DrosophAnnot had 6 instances of ImagePose) ;
  • the class DynamicPose was a subcomponent of ImagePose to store user input for manual corrections (one instance per ImagePose) ;
  • the class State stored common state variables between ImagePose and DrosophAnnot such as the current image's id, the input folder, the output folder, etc.
  • the enum Mode with alternatives IMAGE, POSE, CORRECTION and HEATMAP was used to denote each mode of the GUI. In IMAGE mode the GUI would display the raw images. In POSE mode, the pose estimation results. In CORRECTION mode, mouse input was handled to allow for manual corrections and in HEATMAP mode, a 2D probability map of each joints location was overlayed on the images ;
  • the class CameraNetwork represented a sequence of cameras used for the pose estimation. There were three instances of this class: the camNetLeft (and camNetRight) for cameras on the left (resp. right) of the drosophila and camNetAll for all cameras ;
  • the class PoseDB, a database to read/write manual corrections to/from disk.

In the following subsections I will review the parts of the code were the refactoring had the most impact.

1. DrosophAnnot and coupling

Looking at the main window's interface, we can easily identify methods related to the GUI's behavior (e.g. a button was clicked or a key was pressed) and methods related to the pose estimation, as shown in the listing below.

class DrosophAnnot(QWidget):  # the main GUI class
        
    # methods related to the GUI's behaviour
    def __init__(self):  # 64 lines
    def set_layout(self):  # 150 lines
    def set_mode(self, mode):  # 27 lines
    def checkbox_automatic_changed(self, state):  # 6 lines
    def checkbox_correction_clicked(self, state):  # 4 lines
    def keyPressEvent(self, event):  # 24 lines
    def combo_activated(self, text):  # 5 lines
    def set_first_image(self):  # 3 lines
    def set_last_image(self):  # 3 lines
    def set_prev_image(self):  # 16 lines
    def set_next_image(self):  # 16 lines
    def set_pose(self, img_id):  # 55 lines
    def set_heatmap_joint_id(self, joint_id):  # 2 lines
    def set_img_id_tb(self):  # 8 lines
    def set_joint_id_tb(self):  # 1 lines
    def update_frame(self):  # 6 lines
    
    # methods related to pose estimation that don't belong in the GUI
    def set_cameras(self):  # 44 lines
    def rename_images(self):  # 22 lines
    def pose2d_estimation(self):  # 26 lines
    def already_corrected(self, view, img_id):  # 14 lines
    def get_joint_reprojection_error(self, img_id, joint_id, camNet):  # 16 lines
    def next_error(self, img_id):  # 4 lines
    def next_error_cam(self, img_id, camNet):  # 10 lines
    def prev_error(self, img_id):  # 4 lines
    def prev_error_cam(self, curr_img_id, camNet):  # 10 lines
    def solve_bp(self, save_correction=False):  # 49 lines
    def calibrate_calc(self):  # 26 lines
    def save_calibration(self):  # 5 lines
    def save_pose(self):  # 80 lines

This is an example of coupling between the GUI's code and the core logic's code. Coupling them makes it impossible to reuse the core logic without the GUI. For instance, implementing a command-line interface for deepfly will require to run the pose2d_estimation and calibrate_calc methods.

One of the goal of the refactoring will be to extract these methods from the GUI and put them in a newly create module to hold the core pose-estimation logic.

2. ImagePose and class proliferation

The main widget, DrosophAnnot, delegates the display of each camera's image and manual corrections to the classes ImagePose and DynamicPose.

The ImagePose class is responsible for fetching the image from the camera, displaying it, handling user input related to manual corrections and saving the manual corrections. It stores the state of the manual corrections in an instance of DynamicPose.

These two classes are listed below.

class ImagePose(QWidget):
    def __init__(self, config, cam, f_solve_bp):  # 10 lines
    def clear_mc(self):                 # 1 lines
    def mouseMoveEvent(self, e):        # 34 lines
    def mouseReleaseEvent(self, e):     # 14 lines
    def update_image_pose(self):        # 64 lines
    def get_image_name(self, img_id):   # 3 lines
    def paintEvent(self, paint_event):  # 6 lines
    
    def get_joint_reprojection_error(self, img_id, joint_id, camNet):  # 16 lines
    def save_correction(self, thr=30):  # 41 lines
        
class DynamicPose:
    def __init__(self, points2d, img_id, joint_id, manual_correction=None):  # 6 lines
    def set_joint(self, joint_id, pt2d):  # 5 lines

Handling the mouse events at the level of ImagePose allows to use mouse coordinates relative to the widget's frame, and thus makes coordinates translation (screen to image) easier than handling mouse event at the level of the whole window in DrosophAnnot. I speculate that this was the primary reason for introducing the class ImagePose.

Because the ImagePose needs access to the same information about the core (current image id, communication with a camera object, output folder for writing manual corrections, etc.) as the main widget, several variables were shared between the two classes via a shared data-object, State.

  1. Organizing the code in these 4 classes creates too many "moving pieces". For instance, when first reading the code I was wondering if the State instance was changed during execution to create some sort of undo/redo behavior, or if there could be a mismatch between DrosophAnnot's state and ImagePose's state. This adds unnecessary complexity.

  2. The State class was introduced to avoid duplicating several variables between DrosophAnnot and ImagePose, but doing so, the state variable itself is still duplicated between both.

One of the aim of the refactoring will be to remove the State class, and handle the duplication between DrosophAnnot and ImagePose in a different manner.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. — Antoine de Saint-Exupery

3. Mode and if-complexity

The GUI is modal and can be in one of four modes: IMAGE, POSE RESULTS, CORRECTION and HEATMAPS. In the code, this translates to a Mode enum and several if-elif chain throughout the code. For instance, the method update_image_pose(self) of ImagePose had these lines:

def set_mode(self, mode):
    self.state.mode = mode
    
def update_image_pose(self):
    #...
	if self.state.mode == Mode.IMAGE:
        im = self.cam.get_image(self.state.img_id)
    elif self.state.mode == Mode.HEATMAP:
        draw_joints = #...
        im = self.cam.plot_heatmap(self.state.img_id, False, scale=2, draw_joints)
    elif self.state.mode == Mode.POSE:
        #...

One of the problem here is that this if-elif chain is replicated in multiple places in the code. Another argument against it involves the concept of data-coupling, which would be too long to expose here.

One aim of the refactoring will thus be to remove the proliferation of this if-elif chain.

4. Duplication

Last but not least, at a more local level, the code contains a lot of duplication. I identified three types of duplication:

  1. obvious copy and paste ;
  2. structural duplication, such as duplicated nested for loops ;
  3. duplication due to using the wrong abstractions (such as the State instance).

Duplication makes the code harder to grasp because more text needs to be read and the lack of naming for a concept prevents the cognitive process of chunking [1]. It also makes the code harder to edit because changing one feature of the software, requires editing the code in multiple places. This is also error-prone since one of the duplicated parts is often forgotten and not updated, thus creating a bug.

I will thus aim to remove all duplication during the refactoring, and use named functions to abstract common code.

Referenced in this section:

[1] Raymond Hettinger, The Mental Game of Python, talk at PyBay 2019. Available on YouTube.

Refactoring methodology

Rewriting part of a codebase to solve design issues is commonly known as code refactoring, which is described by Martin Fowler as "a process of altering the internal layout of software without making the changes in the external behavior" [1].

The reference book about refactoring is Refactoring: Improving the Design of Existing Code by Martin Fowler [1]. And Satwinder and Sharanpreet published A Sytematic Literature Review on this topic [2].

I decided to refactor the code as a series of small code transformation. The overall refactoring can thus be seen as a greedy optimization process where each transformation slightly improve the code until there are no improvement left to operate.

My methodology was thus as follows:

  1. identify some part of the code that needs improvements ;
  2. write tests for it to make sure the code before and after the refactoring does the same thing ;
  3. refactor the code, and ensure no functionality was broken.

My tooling was made of:

  • pylint for linting and early bug detection,
  • pytest to write and run tests, with the plugin pytest-qt to test the GUI,
  • coverage.py to get detailed coverage report after running the tests

Raymond Hettinger gave a very useful keynote about tooling at PyBay 2018: Preventing, Finding, and Fixing Bugs On a Time Budget [3].

Referenced in this section:

[1] Martin Fowler & Kent Beck, Refactoring: Improving the Design of Existing Code, 2nd ed., Addison-Wesley Signature Series (Fowler), 2018.

[2] Satwinder Singh, Sharanpreet Kaurb, A systematic literature review: Refactoring for disclosing code smells in object oriented software Ain Shams Engineering Journal Volume 9, Issue 4, December 2018, Pages 2129-2151

[3] Raymond Hettinger, Preventing, Finding, and Fixing Bugs On a Time Budget, PyBay 2018. Available on Youtube.

1. New GUI Interface

The code related to pose estimation was extracted inside the core module and the code for the GUI was reduced to one class: DeepflyGUI. The new GUI interface is as follows:

class DeepflyGUI(QW.QWidget):
    
    # methods to initiate and setup the GUI
    def __init__(self):  # 3 lines
    def setup(self, input_folder=None, num_images_max=None):  # 4 lines
    def set_width(self, width):  # 3 lines
    def setup_layout(self):  # 111 lines

    # methods called when the user sends input
    def onclick_camera_order(self):       # 9 lines
    def onclick_pose2d_estimation(self):  # 2 lines
    def onclick_first_image(self):        # 1 lines
    def onclick_last_image(self):         # 1 lines
    def onclick_prev_image(self):         # 1 lines
    def onclick_next_image(self):         # 1 lines
    def onclick_prev_error(self):         # 6 lines
    def onclick_next_error(self):         # 6 lines
    def onclick_calibrate(self):          # 3 lines
    def onclick_save_pose(self):          # 2 lines
    def onclick_goto_img(self):           # 8 lines
    def onclick_image_mode(self):         # 7 lines
    def onclick_pose_mode(self):          # 9 lines
    def onclick_correction_mode(self):    # 14 lines
    def onclick_heatmap_mode(self):       # 12 lines
	def keyPressEvent(self, key_ev):      # 12 lines 
    def eventFilter(self, iv, mouse_ev):  # 26 lines

    # methods to communicate messages to the user
    def prompt_for_directory(self):           # 6 lines
    def prompt_for_camera_ordering(self):     # 9 lines
    def prompt_for_calibration_range(self):   # 1 lines3
    def display_error_message(self, message): # 2 lines

    # helper methods to change the GUI controls' state
    def uncheck_mode_buttons(self):  # 4 lines
    def enable_correction_controls(self, enabled):  # 3 lines
    def display_img(self, img_id):  # 3 lines
    def update_frame(self):  # 5 lines
    def update_image_view(self, iv):  # 7 lines

The comments should make clear what each group of method does. We can see that each method contributes to either displaying something to the user (a newly enabled button, an error message, a prompt) or handle some user input (onclick... callbacks, eventFilter, etc.). A few helper methods ensure that no duplication happens and provide names to make the rest of the code simpler.

Compared to the previous GUI interface, no methods refers to pose estimation. Removing a method from the new interface would remove a functionality of the GUI. This is a good sign that the new interface is not bigger than it should.

The role of the GUI class is now clearly defined: handle user interactions and forward them to the newly created core module.

2. New Core module

A module core was created. The role of this module is to provide an interface to interact with the pose estimation logic. For instance, while the previous GUI interface had an 80 lines method save_pose, the new GUI has an onclick_save_pose callback which is called when the user clicks the button "save". This new callback is only 2 lines long:

def onclick_save_pose(self):
    self.core.save_pose()
    self.core.save_corrections()

The new core interface is:

class Core:
    def __init__(self, input_folder, num_images_max):  # 9 lines
	def setup_cameras(self):  # 38 lines
    
    # attribute access
    @property def input_folder(self):  # 2 lines
    @property def output_folder(self):  # 2 lines
    @property def image_shape(self):  # 2 lines
    @property def number_of_joints(self):  # 3 lines
    def has_pose(self):  # 1 lines
    def has_heatmap(self):  # 1 lines
    def has_calibration(self):  # 4 lines
    
    # interactions with pose-estimation
    def update_camera_ordering(self, cidread2cid):  # 12 lines
    def pose2d_estimation(self):  # 14 lines
    def next_error(self, img_id):  # 1 lines
    def prev_error(self, img_id):  # 1 lines
    def calibrate_calc(self, min_img_id, max_img_id):  # 35 lines
    def solve_bp(self, img_id):  # 4 lines
    def nearest_joint(self, cam_id, img_id, x, y):  # 10 lines
    def move_joint(self, cam_id, img_id, joint_id, x, y):  # 10 lines

    def save_calibration(self):  # 3 lines
    def save_pose(self):  # 63 lines
    def save_corrections(self):  # 1 line
    
    # visualizations
    def plot_2d(self, cam_id, img_id, with_corrections=False, joints=[]):  # 33 lines
    def plot_heatmap(self, cam_id, img_id, joints=[]):  # 5 lines
    def get_image(self, cam_id, img_id):  # 4 lines
    
    # private helper methods
    def next_error_in_range(self, range_of_ids):  # 6 lines
    def get_joint_reprojection_error(self, img_id, joint_id, camNet):  # 11 lines
    def joint_has_error(self, img_id, joint_id):  # 4 lines
    def solve_bp_for_camnet(self, img_id, camNet):  # 29 lines
    

The new core interface completely encapsulate the old CameraNetwork and PoseDB classes. It is the unique entry point to interact with the pose estimation logic.

Having decoupled the core interface from the GUI, it is now possible to:

  1. test the core functionalities without a GUI interaction framework ;
  2. implement a command-line interface to allow batch processing in the terminal ;
  3. import and use the core in a Jupyter notebook or a python REPL (Read-Eval-Print-Loop).

The current core interface is a direct translation of how the core is used by the GUI. More work is required to refactor its code and improve its API. Side-effects such as I/O should be isolated and the core logic made fully functional. But there's only so much I could do with the given time.

3. Mode and if-elif removal

The Mode enumeration class was entirely removed. Instead of having an if-elif chain in each method, I use callbacks that are changed when the mode changes. For instance, here is the code of onclick_image_mode :

def onclick_image_mode(self):
    self.uncheck_mode_buttons()
    self.button_image_mode.setChecked(True)
    self.combo_joint_id.setEnabled(False)
    self.enable_correction_controls(False)
    self.belief_propagation_enabled = lambda: False
    self.display_method = lambda c,i,j: self.core.get_image(c, i)
    self.update_frame()

We can see that self.display_method is a callback whose behavior is changed when the mode changes.

Also, this new approach to switching mode opened to possibility to enable or disable some GUI controls. For instance, the combo-box combo_joint_id and the corrections controls are disabled when in image mode.

To be clear about the process that I used, here is an example in pseudo code. In the first code snipped below, we have three methods, two of which use an if-elif chain:

class Example1:
    def set_mode(self, mode):
        self.mode = mode

    def do_X(self):
        if self.mode == Mode.A:
            self.do_XA()
        elif self.mode == Mode.B:
            self.do_XB()

    def do_Y(self):
        if self.mode == Mode.A:
            self.do_YA()
        elif self.mode == Mode.B:
            self.do_YB()
            
obj = Example1()
obj.set_mode(Mode.A)
obj.do_X()

Using the callback approach, this snippet would be refactored as follows:

class Example2:
    def set_mode_A(self):
        self.do_X = self.do_XA
        self.do_Y = self.do_YA

    def set_mode_B(self):
        self.do_X = self.do_XB
        self.do_Y = self.do_YB
        
obj = Example2()
obj.set_mode_A()
obj.do_X()

Using the first snippet, the method do_XA() is called. In the second snippet, do_XA() is also called. The resulting functionality is the same but the if-elif chain is completely removed and the resulting code is shorter.

Improved robustness and UX

While working on the GUI, I have also improved the overall user experience:

  1. Better error handling by displaying an error alert in the GUI (instead of killing the application or failing silently).

    For instance, in the method set_img_id_tb, errors were logged to the standard output using print, but users of the GUI wouldn't always have access to this stream. I have used an alert-box instead.

  2. More robust parsing of user's input.

    For instance, by using regular expressions (import re) instead of ast.literal_eval in the method calibrate_calc (code that I have extracted into the method prompt_for_calibration).

  3. Minor bug fixes in the GUI's behavior.

    For instance, manual corrections are saved only if the difference between the original point and the correction is above a given threshold. I have changed the GUI's feedback during manual correction to show that below this threshold, no correction is saved. This create a very perceivable clipping effect.

  4. Slight redesign of the GUI to avoid modal behavior.

    The buttons to go to the next image and previous image were modal and changed behavior based on the state of a checkbox and the GUI's mode. I have created two new buttons "next error" and "previous error" instead.

  5. Better feedback of which controls are enabled.

    I have grammatically disabled buttons related with manual corrections when the GUI is not in correction mode. This makes it clear what their impact is.

A new command-line interface

Having decoupled the core functionalities from the GUI's code, it is now easy to add new interfaces to the pose estimation.

As the lab required to be able to run the pose estimation on several acquisition folders at once, without manual intervention, I was asked to implement a new command-line interface (CLI).

I have designed the CLI code as follows:

  1. The python standard module argparse is used to parse the command-line arguments
  2. The CLI code identifies the required functionality from the parsed arguments and calls the Core's methods accordingly.

The CLI module is made of the following functions:

def main():
def setup_logger(args):   # handles logging levels (debug, info, etc.)
def parse_cli_args():     # argparse
def run_from_file(args):  # calls run(args)
def run_recursive(args):  # calls run(args)
def run_in_folders(args, folders):   # calls run(args)
def run(args):            # instanciates and uses the Core 

Among them, only run(args) uses the Core instance, which ensure a clear decoupling between (1) the parsing of the command-line arguments with argparse and (2) the use of the Core's methods.

The command-line interface has the following features:

  1. Pose estimation can be run on a folder using df3d-cli FOLDER

  2. The flags -v and -vv allows to choose the verbosity level: no flag prints only the warnings and errors, -v prints the info output and -vv prints all debug output.

  3. The results are written in the subfolder df3d by default, but this can be changed using --output-folder flag. For instance:

    df3d-cli ./data/images --output-folder results
    

    will write the results in the folder ./data/images/results.

  4. When a folder contains several sub-folders named images on which to run the pose estimation, the recursive flag can be used. For instance, if the data folder is organized as follows:

    data
    ├── fly1
    │   └── images
    └── fly2
        └── images
    

    Then the following command will run the pose estimation on data/fly1/images and data/fly2/images:

    df3d-cli --recursive ./data  
    
  5. The flag --from-file takes a text file as input and runs the pose estimation on each of the folders listed in it.

  6. The --overwrite flag allows overwriting the existing pose results.

  7. The flag -n specifies a maximal number of images to process. This is useful for testing the output before running a time-consuming computation. For instance, the following command processes only the 100 first images in the data/test folder:

    df3d-cli ./data/test -n 100
    
  8. The --video-2d and --video-3d flags indicate to generate the 2D or 3D videos to visualize the pose results. They can be combined with the --skip-estimation flag to avoid re-running the pose estimation. For instance to run the pose estimation on the whole data/test folder and generate the video with 3D pose results on only the 100 first images, one can do:

    df3d-cli ./data/test
    df3d-cli ./data/test -n 100 --video-3d --skip-estimation
    

The command-line interface's manual can be printed using the --help flag. Its help text is reproduced below:

usage: df3d-cli [-h] [-v] [-vv] [-d] [--output-folder OUTPUT_FOLDER] [-r] [-f]
                [-o] [-n NUM_IMAGES_MAX] [-i [CAMERA_IDS [CAMERA_IDS ...]]]
                [-2d] [-3d] [-skip]
                INPUT

DeepFly3D pose estimation

positional arguments:
  INPUT                 Without additional arguments, a folder containing
                        unlabeled images.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Enable info output (such as progress bars)
  -vv, --verbose2       Enable debug output
  -d, --debug           Displays the argument list for debugging purposes
  --output-folder OUTPUT_FOLDER
                        The name of subfolder where to write results
  -r, --recursive       INPUT is a folder. Successively use its subfolders
                        named 'images/'
  -f, --from-file       INPUT is a text-file, where each line names a folder.
                        Successively use the listed folders.
  -o, --overwrite       Rerun pose estimation and overwrite existing pose
                        results
  -n NUM_IMAGES_MAX, --num-images-max NUM_IMAGES_MAX
                        Maximal number of images to process.
  -i [CAMERA_IDS [CAMERA_IDS ...]], --camera-ids [CAMERA_IDS [CAMERA_IDS ...]]
                        Ordering of the cameras provided as an ordered list of
                        ids. Example: 0 1 4 3 2 5 6.
  -2d, --video-2d       Generate pose2d videos
  -3d, --video-3d       Generate pose3d videos
  -skip, --skip-estimation
                        Skip 2D and 3D pose estimation

Conclusion

Most of my work was done by enforcing stronger boundaries between modules, and removing unnecessary abstractions (i.e. packages and classes). The resulting code is simpler to understand for humans, and the newly implemented command-line interface is both (1) a useful feature for EPFL Ramdya's Lab and (2) a direct proof that decoupling makes new features easier to implement.

More work is needed, however, before I would consider Deepfly an architectural success. In particular, more refactoring is needed in the utility modules to remove duplication and simplify their interface. Also, by modern architectural standards, the dependencies between the Core and I/O should be reversed and the Core should be made fully functional, leaving side-effects to an external layer.

I have been a long term proponent of Alistair Cockburn's Hexagonal Architecture, whose interesting ideas I find fully captured by Robert C. Martin's Clean Architecture. Some key ideas of the clean architecture are summarized for python in Brandon Rhodes' brilliant talk at PyOhio 2014: The Clean Architecture in Python and in various online posts by Robert Martin himself.

The primary purpose of architecture is to support the life cycle of the system. Good architecture makes the system easy to understand, easy to develop, easy to maintain, and easy to deploy. The ultimate goal is to minimize the lifetime cost of the system and to maximize programmer productivity.

— Robert C. Martin in his book The Clean Architecture

Software architecture has many pitfalls, the most dangerous of which is creating wrong abstractions. The rule of thumb is clear on this point: no abstraction is infinitely better than having the wrong abstractions.

No Abstraction > Wrong Abstraction

— Sebastian Markbage in Minimal API Surface Area, talk at JSConf EU 2014

Therefore, my advice to unexperienced designers would be to avoid creating modules and classes, and stick to simple functions with clear inputs, output and no side-effect. To repeat Brandon Rhodes: avoid premature object-orientation.

Premature object orientation: attaching a verb to a noun when you don't need to, yet. [...] If you pass an object to a function that calls only one method on it... pass the method instead.

— Brandon Rhodes, The Clean Architecture (PyOhio 2014)

Referenced in the conclusion:

[1] Alistair Cockburn, The Hexagonal Architecture. Original post available on the web archive.

[2] Robert C. Martin, Clean Architecture: A Craftsman's Guide to Software Structure and Design, Prentice Hall, 2017

[3] Brandon Rhode, The Clean Architecture in Python, PyOhio 2014. Available on Youtube.

[4] Sebastian Markbage, Minimal API Surface Area, JSConf EU 2014. Available on Youtube.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment