title | category | date | tags | summary |
---|---|---|---|---|
Refactoring a graphical application: DeepFly3D |
technical blog |
2019.12.03 |
python |
Deepfly is a python application used to conduct research at EPFL's Ramdya Lab. Coded over a short period of time, the application's python codebase accumulated technical debt that made it hard to implement new features. This report is a summary of my work on Deepfly during the fall semester 2019: (1) how I reduced the technical-debt by rewriting part of the code to improve its overall design and readability, a process commonly known as code refactoring, (2) how I implemented a new command-line based user-interface to interact with the application and (3) various minor improvements to the software's features.
Deepfly is a python application created by Semih Gunel to perform 2D & 3D tethered drosophila pose estimation. From multiple videos of a drosophila, taken at different angles, the application first estimates the position of the drosophila's legs on each 2D-video before creating a 3D-model.
At its core, Deepfly uses deep-learning to perform the pose estimation. The application was built with a Graphical User Interface (GUI) that: (1) runs the pose estimation, (2) displays the results to the end user and (3) allows the user to correct manually the estimated positions.
Deepfly's code was released under the GPL-v3 licence and is available on Github [1]. Below is a screen capture of its GUI.
The application was coded over a short period of time with an emphasis on functionality but not code layout. As a result, there is a lot of duplicated code throughout the codebase, an no clear boundary between the GUI code and the core estimation logic.
Code duplication makes implementing new features or changing existing features hard because for each feature change, the code has to be edited in multiple places. This creates more work than required and is also error-prone, since oftentimes one of the places is forgotten, yielding a mismatch between parts of the code that should have been the same, and therefore a bug.
Since the pose estimation is done directly in the GUI code, implementing another interface would require either to duplicate the pose estimation logic, or to refactor the GUI to be able to share the pose estimation code between the two interfaces.
This report is a summary of my work on Deepfly. In the next sections, I describe how I simplified the internal layout of the code by flattening the file hierarchy and refactoring the GUI's code. Then, I describe new features that I implemented in the GUI and a new command-line based interface.
Referenced in this section:
[1] https://github.com/NeLy-EPFL/DeepFly3D
[TOC]
Most of my work on Deepfly pertained to simplifying its layout and code. As such, I have "alter[ed] the internal layout of software without making the changes in the external behavior" [1].
The guiding principles for these modifications was to make the code simpler for humans to grasp and understand, and as such they are all rooted in the way we understand, process and compress informations.
As the author of Python, Guido van Rossum said in his talk at PyCon 2016 [2]:
"Typically when you ask a programmer to explain to a lay person what a programming language is, they will say that it is how you tell a computer what to do. But if that was all, why would they be so passionate about programming languages when they talk among themselves?
In reality, programming languages are how programmers express and communicate ideas — and the audience for those ideas is other programmers, not computers."
— Guido van Rossum, creator of Python, talk at PyCon 2016
Code structure and layout makes (almost) no difference for the computer executing the code, but makes a huge difference for humans reading and attempting to edit it.
Failure to keep the code tidy and easy to understand for humans is often part of what is called technical debt in software development, explained as follows on Wikipedia: "As a change is started on a codebase, there is often the need to make other coordinated changes at the same time in other parts of the codebase or documentation. Required changes that are not completed are considered debt that must be paid at some point in the future." [3]
In those terms, my work on Deepfly was primarily to reduce its technical debt.
Referenced in this section:
[1] The definition of "refactoring" by Martin Fowler, in his book: Refactoring: Improving the Design of Existing Code, 2nd ed., Addison-Wesley Signature Series (Fowler), 2018.
[2] Guido Van Rossum, Python Language, talk at PyCon 2016. Available on Youtube.
[3] Technical Debt, article on Wikipedia, https://en.wikipedia.org/wiki/Technical_debt
Deepfly's file hierarchy contains multiple sub-packages and sub-sub-packages. Because there is also no clear decoupling between the GUI code and the core pose estimation logic, this makes looking for a file tedious: are the optimization files in the package GUI.util
or pose2d.utils
? Contrary to my intuition, they are in GUI.util
.
Nested hierarchies have several drawbacks, including the following non-exhaustive list:
-
They introduce programmer overhead to (a) remember where a file is located, and (b) choose the proper sub-package to add new files. As is the case in the current version of Deepfly, when a file is placed in the wrong sub-package, this creates confusion and wastes reader's time.
-
They make relative import statement tedious to write. One has to keep in mind the files hierarchy to be able to import a module. For instance, should I write
from . import utils
orfrom .. import utils
? The problem is even worse if there's anutils
module in both the current and parent directories. -
Nested hierarchy tend to be reorganized, which makes all import statement fragile, since each reorganization requires to change all import statements.
-
Each sub-folder requires an
__init__.py
file. These files are often empty, but programmers can't be sure until they open and read them. This wastes programmers time. -
Users of the
deepfly
package have to know the internal layout of the package to be able to import its functions. For instance, rather than:from deepfly.pose3d.utils.procrustes import procrustes_seperate
It is easier to write:
from deepfly.procrustes import procrustes_seperate
Or even:
from deepfly import procrustes_seperate
A flat files hierarchy avoids these drawbacks: programmers know where to look for modules (there is only one or two folders), the import
statements are all relative to the root package, and finding a file using modern search functions is even easier when there is no nested folders.
Traditional python style guides favor flatter hierarchies, as can be read in The Zen of Python [1]: "Flat is better than nested". And as python's core developer Raymond Hettinger said: "Namespaces are for avoiding name clashes, not creating taxonomies" [2].
Flat is better than nested. — The Zen of Python
Jack Diederich also gave a fascinating talk at PyCon US 2012, named Stop Writing Classes where he advocates fewer classes, fewer files and fewer packages to save programmers' time. His main argument is that we should write the least possible amount of code and put the least possible amount of structure in our software, which reminds me of the design principle KISS: "keep it simple, stupid", famous in the software engineering sphere.
I have thus decided to flatten Deepfly's file hierarchy by putting all modules in the root package. Deepfly's current hierarchy is described below:
deepfly
├── GUI
│ ├── BP.py
│ ├── CameraNetwork.py
│ ├── Camera.py
│ ├── Config.py
│ ├── DB.py
│ ├── __init__.py
│ ├── main.py
│ ├── skeleton
│ │ ├── __init__.py
│ │ ├── skeleton_fly.py
│ │ └── skeleton_h36m.py
│ ├── State.py
│ └── util
│ ├── ba_util.py
│ ├── cv_util.py
│ ├── im_util.py
│ ├── main_util.py
│ ├── optim_util.py
│ ├── os_util.py
│ ├── plot_util.py
│ └── signal_util.py
├── __init__.py
├── pose2d
│ ├── ...
│ └── progress
│ ├── demo.gif
│ ├── LICENSE
│ ├── MANIFEST.in
│ ├── progress
│ │ ├── bar.py
│ │ ├── counter.py
│ │ ├── helpers.py
│ │ ├── __init__.py
│ │ └── spinner.py
│ ├── README.rst
│ ├── setup.py
│ └── test_progress.py
└── pose3d
├── __init__.py
└── procrustes
├── __init__.py
└── procrustes.py
11 directories, 49 files
The new file hierarchy is:
deepfly
├── ba_util.py
├── BP.py
├── CameraNetwork.py
├── Camera.py
├── cli.py
├── Config.py
├── core.py
├── cv_util.py
├── DB.py
├── gui.py
├── im_util.py
├── __init__.py
├── logger.py
├── optim_util.py
├── os_util.py
├── plot_util.py
├── pose2d
│ └── ...
├── procrustes.py
├── signal_util.py
├── skeleton_fly.py
├── skeleton_h36m.py
├── utils_ramdya_lab.py
└── video.py
4 directories, 37 files
Due to lack of time, I haven't flattened the pose2d
submodule. I have also kept the original modules' names, but a better naming convention is required for the utility modules. For instance, cv_utils.py
should be named using full words, such as computer_vision.py
. This would allow more meaningful import statements such as from deepfly import computer_vision as cv
rather than from deepfly import cv_utils as cv
With more time, I would make the file hierarchy even flatter, maybe along the lines of:
deepfly
├── cli.py
├── gui.py
├── core.py
├── utils.py
└── vendor
├── external module 1
└── external module 2
Referenced in this section:
[1] Tim Peters, The Zen of Python, PEP 20, https://www.python.org/dev/peps/pep-0020/
[2] Raymond Hettinger in Transforming Code into Beautiful, Idiomatic Python, PyCon US 2013. Available on Youtube.
[3] Jake Diederich, Stop Writing Classes, PyCon US 2012. Available on Youtube.
Before the refactoring, Deepfly's GUI was coded using the following classes:
- the
class DrosophAnnot
was the main Qt window (it inherited fromQWidget
) ; - the
class ImagePose
was a sub-widget to display an image (since 6 images are displayed on the main window,DrosophAnnot
had 6 instances ofImagePose
) ; - the
class DynamicPose
was a subcomponent ofImagePose
to store user input for manual corrections (one instance perImagePose
) ; - the
class State
stored common state variables betweenImagePose
andDrosophAnnot
such as the current image's id, the input folder, the output folder, etc. - the
enum Mode
with alternativesIMAGE
,POSE
,CORRECTION
andHEATMAP
was used to denote each mode of the GUI. InIMAGE
mode the GUI would display the raw images. InPOSE
mode, the pose estimation results. InCORRECTION
mode, mouse input was handled to allow for manual corrections and inHEATMAP
mode, a 2D probability map of each joints location was overlayed on the images ; - the
class CameraNetwork
represented a sequence of cameras used for the pose estimation. There were three instances of this class: thecamNetLeft
(andcamNetRight
) for cameras on the left (resp. right) of the drosophila andcamNetAll
for all cameras ; - the
class PoseDB
, a database to read/write manual corrections to/from disk.
In the following subsections I will review the parts of the code were the refactoring had the most impact.
Looking at the main window's interface, we can easily identify methods related to the GUI's behavior (e.g. a button was clicked or a key was pressed) and methods related to the pose estimation, as shown in the listing below.
class DrosophAnnot(QWidget): # the main GUI class
# methods related to the GUI's behaviour
def __init__(self): # 64 lines
def set_layout(self): # 150 lines
def set_mode(self, mode): # 27 lines
def checkbox_automatic_changed(self, state): # 6 lines
def checkbox_correction_clicked(self, state): # 4 lines
def keyPressEvent(self, event): # 24 lines
def combo_activated(self, text): # 5 lines
def set_first_image(self): # 3 lines
def set_last_image(self): # 3 lines
def set_prev_image(self): # 16 lines
def set_next_image(self): # 16 lines
def set_pose(self, img_id): # 55 lines
def set_heatmap_joint_id(self, joint_id): # 2 lines
def set_img_id_tb(self): # 8 lines
def set_joint_id_tb(self): # 1 lines
def update_frame(self): # 6 lines
# methods related to pose estimation that don't belong in the GUI
def set_cameras(self): # 44 lines
def rename_images(self): # 22 lines
def pose2d_estimation(self): # 26 lines
def already_corrected(self, view, img_id): # 14 lines
def get_joint_reprojection_error(self, img_id, joint_id, camNet): # 16 lines
def next_error(self, img_id): # 4 lines
def next_error_cam(self, img_id, camNet): # 10 lines
def prev_error(self, img_id): # 4 lines
def prev_error_cam(self, curr_img_id, camNet): # 10 lines
def solve_bp(self, save_correction=False): # 49 lines
def calibrate_calc(self): # 26 lines
def save_calibration(self): # 5 lines
def save_pose(self): # 80 lines
This is an example of coupling between the GUI's code and the core logic's code. Coupling them makes it impossible to reuse the core logic without the GUI. For instance, implementing a command-line interface for deepfly
will require to run the pose2d_estimation
and calibrate_calc
methods.
One of the goal of the refactoring will be to extract these methods from the GUI and put them in a newly create module to hold the core pose-estimation logic.
The main widget, DrosophAnnot
, delegates the display of each camera's image and manual corrections to the classes ImagePose
and DynamicPose
.
The ImagePose
class is responsible for fetching the image from the camera, displaying it, handling user input related to manual corrections and saving the manual corrections. It stores the state of the manual corrections in an instance of DynamicPose
.
These two classes are listed below.
class ImagePose(QWidget):
def __init__(self, config, cam, f_solve_bp): # 10 lines
def clear_mc(self): # 1 lines
def mouseMoveEvent(self, e): # 34 lines
def mouseReleaseEvent(self, e): # 14 lines
def update_image_pose(self): # 64 lines
def get_image_name(self, img_id): # 3 lines
def paintEvent(self, paint_event): # 6 lines
def get_joint_reprojection_error(self, img_id, joint_id, camNet): # 16 lines
def save_correction(self, thr=30): # 41 lines
class DynamicPose:
def __init__(self, points2d, img_id, joint_id, manual_correction=None): # 6 lines
def set_joint(self, joint_id, pt2d): # 5 lines
Handling the mouse events at the level of ImagePose
allows to use mouse coordinates relative to the widget's frame, and thus makes coordinates translation (screen to image) easier than handling mouse event at the level of the whole window in DrosophAnnot
. I speculate that this was the primary reason for introducing the class ImagePose
.
Because the ImagePose
needs access to the same information about the core (current image id, communication with a camera object, output folder for writing manual corrections, etc.) as the main widget, several variables were shared between the two classes via a shared data-object, State
.
-
Organizing the code in these 4 classes creates too many "moving pieces". For instance, when first reading the code I was wondering if the
State
instance was changed during execution to create some sort of undo/redo behavior, or if there could be a mismatch betweenDrosophAnnot
's state andImagePose
's state. This adds unnecessary complexity. -
The
State
class was introduced to avoid duplicating several variables betweenDrosophAnnot
andImagePose
, but doing so, the state variable itself is still duplicated between both.
One of the aim of the refactoring will be to remove the State
class, and handle the duplication between DrosophAnnot
and ImagePose
in a different manner.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. — Antoine de Saint-Exupery
The GUI is modal and can be in one of four modes: IMAGE
, POSE RESULTS
, CORRECTION
and HEATMAPS
. In the code, this translates to a Mode
enum and several if-elif
chain throughout the code. For instance, the method update_image_pose(self)
of ImagePose
had these lines:
def set_mode(self, mode):
self.state.mode = mode
def update_image_pose(self):
#...
if self.state.mode == Mode.IMAGE:
im = self.cam.get_image(self.state.img_id)
elif self.state.mode == Mode.HEATMAP:
draw_joints = #...
im = self.cam.plot_heatmap(self.state.img_id, False, scale=2, draw_joints)
elif self.state.mode == Mode.POSE:
#...
One of the problem here is that this if-elif
chain is replicated in multiple places in the code. Another argument against it involves the concept of data-coupling, which would be too long to expose here.
One aim of the refactoring will thus be to remove the proliferation of this if-elif
chain.
Last but not least, at a more local level, the code contains a lot of duplication. I identified three types of duplication:
- obvious copy and paste ;
- structural duplication, such as duplicated nested for loops ;
- duplication due to using the wrong abstractions (such as the
State
instance).
Duplication makes the code harder to grasp because more text needs to be read and the lack of naming for a concept prevents the cognitive process of chunking [1]. It also makes the code harder to edit because changing one feature of the software, requires editing the code in multiple places. This is also error-prone since one of the duplicated parts is often forgotten and not updated, thus creating a bug.
I will thus aim to remove all duplication during the refactoring, and use named functions to abstract common code.
Referenced in this section:
[1] Raymond Hettinger, The Mental Game of Python, talk at PyBay 2019. Available on YouTube.
Rewriting part of a codebase to solve design issues is commonly known as code refactoring, which is described by Martin Fowler as "a process of altering the internal layout of software without making the changes in the external behavior" [1].
The reference book about refactoring is Refactoring: Improving the Design of Existing Code by Martin Fowler [1]. And Satwinder and Sharanpreet published A Sytematic Literature Review on this topic [2].
I decided to refactor the code as a series of small code transformation. The overall refactoring can thus be seen as a greedy optimization process where each transformation slightly improve the code until there are no improvement left to operate.
My methodology was thus as follows:
- identify some part of the code that needs improvements ;
- write tests for it to make sure the code before and after the refactoring does the same thing ;
- refactor the code, and ensure no functionality was broken.
My tooling was made of:
pylint
for linting and early bug detection,pytest
to write and run tests, with the pluginpytest-qt
to test the GUI,coverage.py
to get detailed coverage report after running the tests
Raymond Hettinger gave a very useful keynote about tooling at PyBay 2018: Preventing, Finding, and Fixing Bugs On a Time Budget [3].
Referenced in this section:
[1] Martin Fowler & Kent Beck, Refactoring: Improving the Design of Existing Code, 2nd ed., Addison-Wesley Signature Series (Fowler), 2018.
[2] Satwinder Singh, Sharanpreet Kaurb, A systematic literature review: Refactoring for disclosing code smells in object oriented software Ain Shams Engineering Journal Volume 9, Issue 4, December 2018, Pages 2129-2151
[3] Raymond Hettinger, Preventing, Finding, and Fixing Bugs On a Time Budget, PyBay 2018. Available on Youtube.
The code related to pose estimation was extracted inside the core
module and the code for the GUI was reduced to one class: DeepflyGUI
. The new GUI interface is as follows:
class DeepflyGUI(QW.QWidget):
# methods to initiate and setup the GUI
def __init__(self): # 3 lines
def setup(self, input_folder=None, num_images_max=None): # 4 lines
def set_width(self, width): # 3 lines
def setup_layout(self): # 111 lines
# methods called when the user sends input
def onclick_camera_order(self): # 9 lines
def onclick_pose2d_estimation(self): # 2 lines
def onclick_first_image(self): # 1 lines
def onclick_last_image(self): # 1 lines
def onclick_prev_image(self): # 1 lines
def onclick_next_image(self): # 1 lines
def onclick_prev_error(self): # 6 lines
def onclick_next_error(self): # 6 lines
def onclick_calibrate(self): # 3 lines
def onclick_save_pose(self): # 2 lines
def onclick_goto_img(self): # 8 lines
def onclick_image_mode(self): # 7 lines
def onclick_pose_mode(self): # 9 lines
def onclick_correction_mode(self): # 14 lines
def onclick_heatmap_mode(self): # 12 lines
def keyPressEvent(self, key_ev): # 12 lines
def eventFilter(self, iv, mouse_ev): # 26 lines
# methods to communicate messages to the user
def prompt_for_directory(self): # 6 lines
def prompt_for_camera_ordering(self): # 9 lines
def prompt_for_calibration_range(self): # 1 lines3
def display_error_message(self, message): # 2 lines
# helper methods to change the GUI controls' state
def uncheck_mode_buttons(self): # 4 lines
def enable_correction_controls(self, enabled): # 3 lines
def display_img(self, img_id): # 3 lines
def update_frame(self): # 5 lines
def update_image_view(self, iv): # 7 lines
The comments should make clear what each group of method does. We can see that each method contributes to either displaying something to the user (a newly enabled button, an error message, a prompt) or handle some user input (onclick...
callbacks, eventFilter
, etc.). A few helper methods ensure that no duplication happens and provide names to make the rest of the code simpler.
Compared to the previous GUI interface, no methods refers to pose estimation. Removing a method from the new interface would remove a functionality of the GUI. This is a good sign that the new interface is not bigger than it should.
The role of the GUI class is now clearly defined: handle user interactions and forward them to the newly created core
module.
A module core
was created. The role of this module is to provide an interface to interact with the pose estimation logic. For instance, while the previous GUI interface had an 80 lines method save_pose
, the new GUI has an onclick_save_pose
callback which is called when the user clicks the button "save". This new callback is only 2 lines long:
def onclick_save_pose(self):
self.core.save_pose()
self.core.save_corrections()
The new core interface is:
class Core:
def __init__(self, input_folder, num_images_max): # 9 lines
def setup_cameras(self): # 38 lines
# attribute access
@property def input_folder(self): # 2 lines
@property def output_folder(self): # 2 lines
@property def image_shape(self): # 2 lines
@property def number_of_joints(self): # 3 lines
def has_pose(self): # 1 lines
def has_heatmap(self): # 1 lines
def has_calibration(self): # 4 lines
# interactions with pose-estimation
def update_camera_ordering(self, cidread2cid): # 12 lines
def pose2d_estimation(self): # 14 lines
def next_error(self, img_id): # 1 lines
def prev_error(self, img_id): # 1 lines
def calibrate_calc(self, min_img_id, max_img_id): # 35 lines
def solve_bp(self, img_id): # 4 lines
def nearest_joint(self, cam_id, img_id, x, y): # 10 lines
def move_joint(self, cam_id, img_id, joint_id, x, y): # 10 lines
def save_calibration(self): # 3 lines
def save_pose(self): # 63 lines
def save_corrections(self): # 1 line
# visualizations
def plot_2d(self, cam_id, img_id, with_corrections=False, joints=[]): # 33 lines
def plot_heatmap(self, cam_id, img_id, joints=[]): # 5 lines
def get_image(self, cam_id, img_id): # 4 lines
# private helper methods
def next_error_in_range(self, range_of_ids): # 6 lines
def get_joint_reprojection_error(self, img_id, joint_id, camNet): # 11 lines
def joint_has_error(self, img_id, joint_id): # 4 lines
def solve_bp_for_camnet(self, img_id, camNet): # 29 lines
The new core interface completely encapsulate the old CameraNetwork
and PoseDB
classes. It is the unique entry point to interact with the pose estimation logic.
Having decoupled the core interface from the GUI, it is now possible to:
- test the core functionalities without a GUI interaction framework ;
- implement a command-line interface to allow batch processing in the terminal ;
- import and use the core in a Jupyter notebook or a python REPL (Read-Eval-Print-Loop).
The current core interface is a direct translation of how the core is used by the GUI. More work is required to refactor its code and improve its API. Side-effects such as I/O should be isolated and the core logic made fully functional. But there's only so much I could do with the given time.
The Mode
enumeration class was entirely removed. Instead of having an if-elif
chain in each method, I use callbacks that are changed when the mode changes. For instance, here is the code of onclick_image_mode
:
def onclick_image_mode(self):
self.uncheck_mode_buttons()
self.button_image_mode.setChecked(True)
self.combo_joint_id.setEnabled(False)
self.enable_correction_controls(False)
self.belief_propagation_enabled = lambda: False
self.display_method = lambda c,i,j: self.core.get_image(c, i)
self.update_frame()
We can see that self.display_method
is a callback whose behavior is changed when the mode changes.
Also, this new approach to switching mode opened to possibility to enable or disable some GUI controls. For instance, the combo-box combo_joint_id
and the corrections controls are disabled when in image mode.
To be clear about the process that I used, here is an example in pseudo code. In the first code snipped below, we have three methods, two of which use an if-elif
chain:
class Example1:
def set_mode(self, mode):
self.mode = mode
def do_X(self):
if self.mode == Mode.A:
self.do_XA()
elif self.mode == Mode.B:
self.do_XB()
def do_Y(self):
if self.mode == Mode.A:
self.do_YA()
elif self.mode == Mode.B:
self.do_YB()
obj = Example1()
obj.set_mode(Mode.A)
obj.do_X()
Using the callback approach, this snippet would be refactored as follows:
class Example2:
def set_mode_A(self):
self.do_X = self.do_XA
self.do_Y = self.do_YA
def set_mode_B(self):
self.do_X = self.do_XB
self.do_Y = self.do_YB
obj = Example2()
obj.set_mode_A()
obj.do_X()
Using the first snippet, the method do_XA()
is called. In the second snippet, do_XA()
is also called. The resulting functionality is the same but the if-elif
chain is completely removed and the resulting code is shorter.
While working on the GUI, I have also improved the overall user experience:
-
Better error handling by displaying an error alert in the GUI (instead of killing the application or failing silently).
For instance, in the method
set_img_id_tb
, errors were logged to the standard output usingprint
, but users of the GUI wouldn't always have access to this stream. I have used an alert-box instead. -
More robust parsing of user's input.
For instance, by using regular expressions (
import re
) instead ofast.literal_eval
in the methodcalibrate_calc
(code that I have extracted into the methodprompt_for_calibration
). -
Minor bug fixes in the GUI's behavior.
For instance, manual corrections are saved only if the difference between the original point and the correction is above a given threshold. I have changed the GUI's feedback during manual correction to show that below this threshold, no correction is saved. This create a very perceivable clipping effect.
-
Slight redesign of the GUI to avoid modal behavior.
The buttons to go to the next image and previous image were modal and changed behavior based on the state of a checkbox and the GUI's mode. I have created two new buttons "next error" and "previous error" instead.
-
Better feedback of which controls are enabled.
I have grammatically disabled buttons related with manual corrections when the GUI is not in correction mode. This makes it clear what their impact is.
Having decoupled the core functionalities from the GUI's code, it is now easy to add new interfaces to the pose estimation.
As the lab required to be able to run the pose estimation on several acquisition folders at once, without manual intervention, I was asked to implement a new command-line interface (CLI).
I have designed the CLI code as follows:
- The python standard module
argparse
is used to parse the command-line arguments - The CLI code identifies the required functionality from the parsed arguments and calls the
Core
's methods accordingly.
The CLI module is made of the following functions:
def main():
def setup_logger(args): # handles logging levels (debug, info, etc.)
def parse_cli_args(): # argparse
def run_from_file(args): # calls run(args)
def run_recursive(args): # calls run(args)
def run_in_folders(args, folders): # calls run(args)
def run(args): # instanciates and uses the Core
Among them, only run(args)
uses the Core
instance, which ensure a clear decoupling between (1) the parsing of the command-line arguments with argparse
and (2) the use of the Core's methods.
The command-line interface has the following features:
-
Pose estimation can be run on a folder using
df3d-cli FOLDER
-
The flags
-v
and-vv
allows to choose the verbosity level: no flag prints only the warnings and errors,-v
prints the info output and-vv
prints all debug output. -
The results are written in the subfolder
df3d
by default, but this can be changed using--output-folder
flag. For instance:df3d-cli ./data/images --output-folder results
will write the results in the folder
./data/images/results
. -
When a folder contains several sub-folders named
images
on which to run the pose estimation, the recursive flag can be used. For instance, if thedata
folder is organized as follows:data ├── fly1 │ └── images └── fly2 └── images
Then the following command will run the pose estimation on
data/fly1/images
anddata/fly2/images
:df3d-cli --recursive ./data
-
The flag
--from-file
takes a text file as input and runs the pose estimation on each of the folders listed in it. -
The
--overwrite
flag allows overwriting the existing pose results. -
The flag
-n
specifies a maximal number of images to process. This is useful for testing the output before running a time-consuming computation. For instance, the following command processes only the100
first images in thedata/test
folder:df3d-cli ./data/test -n 100
-
The
--video-2d
and--video-3d
flags indicate to generate the 2D or 3D videos to visualize the pose results. They can be combined with the--skip-estimation
flag to avoid re-running the pose estimation. For instance to run the pose estimation on the wholedata/test
folder and generate the video with 3D pose results on only the100
first images, one can do:df3d-cli ./data/test df3d-cli ./data/test -n 100 --video-3d --skip-estimation
The command-line interface's manual can be printed using the --help
flag. Its help text is reproduced below:
usage: df3d-cli [-h] [-v] [-vv] [-d] [--output-folder OUTPUT_FOLDER] [-r] [-f]
[-o] [-n NUM_IMAGES_MAX] [-i [CAMERA_IDS [CAMERA_IDS ...]]]
[-2d] [-3d] [-skip]
INPUT
DeepFly3D pose estimation
positional arguments:
INPUT Without additional arguments, a folder containing
unlabeled images.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Enable info output (such as progress bars)
-vv, --verbose2 Enable debug output
-d, --debug Displays the argument list for debugging purposes
--output-folder OUTPUT_FOLDER
The name of subfolder where to write results
-r, --recursive INPUT is a folder. Successively use its subfolders
named 'images/'
-f, --from-file INPUT is a text-file, where each line names a folder.
Successively use the listed folders.
-o, --overwrite Rerun pose estimation and overwrite existing pose
results
-n NUM_IMAGES_MAX, --num-images-max NUM_IMAGES_MAX
Maximal number of images to process.
-i [CAMERA_IDS [CAMERA_IDS ...]], --camera-ids [CAMERA_IDS [CAMERA_IDS ...]]
Ordering of the cameras provided as an ordered list of
ids. Example: 0 1 4 3 2 5 6.
-2d, --video-2d Generate pose2d videos
-3d, --video-3d Generate pose3d videos
-skip, --skip-estimation
Skip 2D and 3D pose estimation
Most of my work was done by enforcing stronger boundaries between modules, and removing unnecessary abstractions (i.e. packages and classes). The resulting code is simpler to understand for humans, and the newly implemented command-line interface is both (1) a useful feature for EPFL Ramdya's Lab and (2) a direct proof that decoupling makes new features easier to implement.
More work is needed, however, before I would consider Deepfly an architectural success. In particular, more refactoring is needed in the utility modules to remove duplication and simplify their interface. Also, by modern architectural standards, the dependencies between the Core and I/O should be reversed and the Core should be made fully functional, leaving side-effects to an external layer.
I have been a long term proponent of Alistair Cockburn's Hexagonal Architecture, whose interesting ideas I find fully captured by Robert C. Martin's Clean Architecture. Some key ideas of the clean architecture are summarized for python in Brandon Rhodes' brilliant talk at PyOhio 2014: The Clean Architecture in Python and in various online posts by Robert Martin himself.
The primary purpose of architecture is to support the life cycle of the system. Good architecture makes the system easy to understand, easy to develop, easy to maintain, and easy to deploy. The ultimate goal is to minimize the lifetime cost of the system and to maximize programmer productivity.
— Robert C. Martin in his book The Clean Architecture
Software architecture has many pitfalls, the most dangerous of which is creating wrong abstractions. The rule of thumb is clear on this point: no abstraction is infinitely better than having the wrong abstractions.
No Abstraction > Wrong Abstraction
— Sebastian Markbage in Minimal API Surface Area, talk at JSConf EU 2014
Therefore, my advice to unexperienced designers would be to avoid creating modules and classes, and stick to simple functions with clear inputs, output and no side-effect. To repeat Brandon Rhodes: avoid premature object-orientation.
Premature object orientation: attaching a verb to a noun when you don't need to, yet. [...] If you pass an object to a function that calls only one method on it... pass the method instead.
— Brandon Rhodes, The Clean Architecture (PyOhio 2014)
Referenced in the conclusion:
[1] Alistair Cockburn, The Hexagonal Architecture. Original post available on the web archive.
[2] Robert C. Martin, Clean Architecture: A Craftsman's Guide to Software Structure and Design, Prentice Hall, 2017
[3] Brandon Rhode, The Clean Architecture in Python, PyOhio 2014. Available on Youtube.
[4] Sebastian Markbage, Minimal API Surface Area, JSConf EU 2014. Available on Youtube.