Skip to content

Instantly share code, notes, and snippets.

@satwikkansal
Last active February 11, 2019 19:35
Show Gist options
  • Save satwikkansal/aef2d5be0e8a458ffa3a434e0f472b93 to your computer and use it in GitHub Desktop.
Save satwikkansal/aef2d5be0e8a458ffa3a434e0f472b93 to your computer and use it in GitHub Desktop.
A blog post on Automating development workflow process

Table of Contents

Automating your development workflow: An illustration using Python as example

Software projects follow different development practices like testing, coverage, style-standards, code-reviews, etc., to yield a codebase that's easy to maintain and bug-free. However, these practices are often manual, redundant, time-consuming, and not-so-strictly enforced which may discourage the stakeholders from following them and deprive them a lot of inherent benefits. Moreover, the redundancy of these activities might impact productivity.

The intent of this post is to go through different automated processes that we can inject in our software production pipeline to save time and achieve a maintainable codebase and a hassle-free deployment.

To demonstrate some of these automated processes by the implementation, let's take an example of developing a very simple REST API using python-flask. We'll follow "Feature Branch workflow" for the project, which includes

  1. Scaffolding the project: Initializing a repository or cloning the hosted project repository from remote, installing the dependencies, etc.
  2. Creating the feature branch
  3. Writing the code
  4. Submitting a pull request
  5. Code review and re-iteration of the Pull Request until everyone is happy
  6. Merge into the master branch
  7. Optimization (if any) and deployment to the production

To ensure code quality, no production breakages, and maintainability, here a few activities that we would like to perform during the development workflow

Ensure our codebase is compliant to some style standard

This involves getting affirmations for the questions like,

  • Are we using single quotes or double quotes consistently for representing strings?
  • Is every line within 80 character limit?
  • Do all the files end with a newline character?
  • Are we consistent with naming conventions (snake_case or camelCase) for filenames and symbols?

Ensure that the tests are passing

  • Are we sure that the new feature will not break any existing functionalities?

Ensure that the tests cover the new changes

  • Have the tests been written for the new feature?
  • Do the newly written test cover all the scenarios or at least the major ones? In other words, what's the case coverage of the new code?

Other possible checks

  • Are the commit messages descriptive enough? Does the commit message follow the guidelines?
  • Are there any special non-ASCII character used in the code or file names?
  • Is the project building successfully and ready to deploy?
  • Is the project functioning on all the platforms it is supposed to support?
  • And so on...

So, the checklist of tasks to be performed (by developers and code-reviewers) for every Pull Request iteration of our feature is quite long. To avoid skipping any such checks (which can be critical), most of the open source organizations have guidelines to serve as a checklist of things needed to verify.

And if we observe, most of these tasks are definitive (and repetitive), it means they can be automated. In the rest of the post, we'll see where and how to inject the automated versions of these tasks. This will help in several ways by,

  • Reducing the mental overhead for developers and code reviewers
  • Delegating the work to the software (since software don't mind doing boring, redundant tasks)
  • Most importantly, saving time, and hence improving productivity

So let's get started with it!

Let the development begin!

Scaffolding

We'll initialize an empty git repository, and implement a minimal working version of our server. For projects with intricate tech-stacks, one might consider using scaffolding tools available on open-source (like cookiecutter and pyscaffold), but for us, installing the dependencies and writing simple flask hello world program should suffice for now.

Here's our app.py file,

# File: app.py
from flask import Flask, request

app = Flask(__name__)
 
@app.route('/')
def hello_world():
    return 'The server is up and running!'

And here's our requirements.txt wherein we store our dependencies.

# File: requirements.txt
Flask==1.0.2
gunicorn==19.9.0

The core idea behind requirements.txt is to keep track of dependencies and their versions, so that they can be exactly replicated in other collaborator's systems and the production server avoiding impromptu breakages to due to weird dependency issues.

Working on the feature

The project is scaffolded; it's time for devs to work on a feature. For simplicity, let's say our feature is to "Create an endpoint for the division of two numbers".

# File: app.py
from __future__ import division
from flask import Flask, request

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello World!'

@app.route('/division')
def get_division():
    x = float(request.args.get('x'))
    y = float(request.args.get('y'))
    return str(compute_division(x,y))

def compute_division(x, y):
    if y == 0:
        return float('inf')
    return x/y

Let's do a sanity check if everything works,

➜  autodevelop git:(master) ✗ export FLASK_ENV=development
➜  autodevelop git:(master) ✗ export FLASK_APP=app.py
➜  autodevelop git:(master) ✗ flask run

 * Serving Flask app "app.py" (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 130-555-941
127.0.0.1 - - [11/Feb/2019 01:20:50] "GET / HTTP/1.1" 200 -

The server is up.

Screen-Shot-2019-02-11-at-1-21-03-AM.png

Screen-Shot-2019-02-11-at-1-24-25-AM.png

The code looks neat. Let's write tests for it. One might follow TDD and write tests upfront before implementing the feature; either way seems fine for our small feature.

# File: test_app.py
from autodeploy import app


def test_compute_product_non_zero():
    assert(compute_division(4, 2) == 2)


def test_compute_division_fraction():
    assert(compute_division(2, 4) == 0.5)


def test_compute_division_zero_denomintor():
    assert(compute_division(2, 0) == float('inf'))


def test_compute_division_zero_numerator():
    assert(compute_division(0, 2) == 0)

Here's our project's directory structure for reference

.
├── autodevelop
│   ├── __init__.py
│   └── app.py
├── requirements.txt
└── tests
    ├── __init__.py
    └── test_app.py

Nothing fancy, just basic tests for our compute_product function covering all the branches.

Let's run the tests and see if they pass.

(venv) ➜  project git:(master) ✗ pytest
======================================================== test session starts ========================================================
platform darwin -- Python 3.6.5, pytest-4.1.1, py-1.7.0, pluggy-0.8.1
rootdir: /Users/300041709/Desktop/cm/python_deployment_pipeline/project, inifile:
plugins: cov-2.6.1
collected 4 items                                                                                                                   

tests/test_app.py ....                                                                                                        [100%]

===================================================== 4 passed in 0.14 seconds ====================================================== 

We have used pytest to run the tests, as it supports some nice features like autodiscovery of tests, detailed reports of test runs, and also has a few handy plugins available for specific scenarios. Also, we used simple assert statements for the tests, but for intricate test cases, it is recommended to use the unittest module.

Let's specify the dependencies to run the tests in a separate file dev-requirements.txt to differentiate between the application's dependency and the dependencies required to run tests.

# File: dev-requirements.txt
pytest==4.1.1

Once the unit tests and the code is written, it's time for the dev to create a Pull Request for others to review. But before creating a PR, there a few checklist tasks that the dev has to perform,

  • Make sure that the newly written tests have enough coverage.
  • Make sure all the unit tests pass
  • Check for discrepancies against style guidelines
  • Fix issues related to the above checks (if any), and repeat.

All this is necessary to ease the life of the code reviewer, and prevent multiple iterations of the Pull request due to silly mistakes.

Let's see how we can automate such tasks,

Git hooks to the rescue

Git provides hooks, which are not but scripts that are executed in accordance with defined events such as committing, pushing, receiving, etc.

There may be various logical ways to classify hooks in git, but a couple of significant ones are,

  • Post and pre-hooks: As the name suggests, post-hooks execute immediately after the event is completed, and pre-hooks execute before the event starts occurring. If the pre-hook returns a non-zero exit status, the event is aborted.
  • Client side and server side hooks: Client-side hooks associate to developer side events like committing, merging, pushing, etc. Server sides hooks are associated with operations like receiving pushed commits.

We can use hooks to run checks, enforce rules (or constraints), perform redundant tasks at different stages in our development workflow.

Setting up the hooks

Setting up hooks in git is as straightforward as creating an executable script with a correct hook-name in the .git/hooks Directory of a git repository. Sample hooks (with the names) are already provided in the directory. Documentation about various hooks and their names can be found here.

Since the hooks reside inside .git Directory, they are not checked out when you push to a remote repository hosting service like GitHub. To be able to do that, we need to change one git configuration, here's how

$ git config core.hooksPath hooks

Now we can define the scripts in the hooks directory instead. This will allow us to share the hooks with all the collaborators.

Client-side git hooks

Let's see how we can use some of the client side hooks for our use case,

Using the pre-commit hook

We can use a pre-commit hook for

  • Pointing out and auto fixing (if possible) any style mistakes
  • Enforcing commit guidelines. Example, making sure the commit references the link in the issue tracker.
  • Eliminating certain strings (like pdb, print statements, special characters, etc.)
  • Avoiding non-ASCII filenames, EOFs, etc.

Static code analysis tools like pylint, autopep8, coala, etc. provide capabilities to define style guidelines (or follow standard guidelines like PEP8), detect any violations to them, and if possible, suggest patches and even auto-fix.

Here's our pre-commit hook, which takes all the python files to be committed, and runs autopep8 on them, auto fixing the violations to PEP8 (if any).

#!/bin/bash
# File: pre-commit
echo "Running pre-commit hook"
touched_python_files=`git diff --cached --name-only |egrep '\.py$' || true`
if [ -n "$touched_python_files" ]; then
    autopep8 -riv $touched_python_files
fi

Before we test this out, we need to install autopep8 using pip, and also add it to dev-requirements.txt to keep track of all the dev dependencies.

# File: dev-requirements.txt
pytest==4.1.1
autopep8==1.4.3

Now every time we commit and python files, our git hook will take care of running autopep8 over them. Actually, let's try to commit our app.py and see what happens.

(venv) ➜  project git:(development) ✗ git add app.py                   
(venv) ➜  project git:(development) ✗ git commit -m "Test autopep8"
Running pre-commit hook
[file:autodeploy/app.py]
--->  Applying global fix for E265
--->  4 issue(s) to fix {'E302': {16, 10, 6}, 'E231': {14}}
--->  0 issue(s) to fix {}
[development 5d9a8a1] Test autopep8
 2 files changed, 2 insertions(+), 2 deletions(-)

So running autopep8 transformed our code to this, neat, isn't it?

Using pre-push hook

We can use the pre-push hook to,

  • Run all the tests and make sure that they pass.
  • Have strict code coverage checks to make sure the newly written code is tested properly.
#!/bin/bash
# File: pre-push
pytest --cov=autodeploy --cov-report term-missing tests/ || exit 1

If the tests fail, the script will return a non-zero exit status, and the code will not be pushed to the remote. Neat.

(venv) ➜  project git:(master) ✗ pytest --cov=autodevelop --cov-report term-missing tests/
======================================================== test session starts ========================================================
platform darwin -- Python 3.6.5, pytest-4.1.1, py-1.7.0, pluggy-0.8.1
rootdir: /Users/300041709/Desktop/cm/python_deployment_pipeline/project, inifile:
plugins: cov-2.6.1
collected 4 items                                                                                                                   

tests/test_app.py ....                                                                                                        [100%]

---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
autodevelop/__init__.py       0      0   100%
autodevelop/app.py           13      4    69%   10, 15-17
-------------------------------------------------------
TOTAL                        13      4    69%


===================================================== 4 passed in 0.24 seconds ======================================================
(venv) ➜  project git:(master) ✗ 

The current coverage is 69%. Although the only uncovered portion is the sort of boilerplate code for endpoints, but let's make a strict coverage requirement of 100%.

#!/bin/bash
# File: pre-push
MIN_COVERAGE=100

pytest --cov=autodevelop --cov-report term-missing tests/ || exit 1
coverage report --fail-under=$MIN_COVERAGE || exit 1

The script emits a non-zero exists status if either the tests or the coverage-constraints fail. If we try to push now, the push should fail.

tests/test_app.py ....                                                                                                        [100%]

---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
autodevelop/__init__.py       0      0   100%
autodevelop/app.py           13      4    69%   10, 15-17
-------------------------------------------------------
TOTAL                        13      4    69%


===================================================== 4 passed in 0.23 seconds ======================================================
Name                      Stmts   Miss  Cover
---------------------------------------------
autodevelop/__init__.py       0      0   100%
autodevelop/app.py           13      4    69%
---------------------------------------------
TOTAL                        13      4    69%
error: failed to push some refs to 'git@github.com:satwikkansal/python_automated_pipeline.git'

Now let's add tests to cover the endpoints,

from autodeploy.app import compute_division, app

test_client = app.test_client()
test_client.testing = True


def test_root():
    response = test_client.get('/')
    assert(response.status_code == 200)
    assert(response.data.decode() == "Hello World!")


def test_get_division():
    response = test_client.get('/division?x=5&y=2')
    assert(response.status_code == 200)
    assert(response.data.decode() == "2.5")

Since we've covered every line, the coverage should be 100%. Let's try to push now. It works! By the way, if we aren't able to test some specific parts of the code due to some reason, we can add pragma comments to make up for the coverage.

Pull-request and code review

So the style guidelines have been taken care of, the code is thoroughly tested, the tests are clean, and now it's time to raise a Pull request for others to review.

Ideally, it is also the code-reviewer's responsibility to verify previous checks, so it makes sense to add similar checks on the Pull Request raising platform as well. Moreover, the integration part of the code should be tested here (running regressions, integration tests) to be assured that nothing breaks when this code is made live to production once it is merged.

Countinuous Integration

Since it is very hard for the developers to test the project in various environments (different OS, python versions, etc.) using Virtual Machines, we can use Continuous Integration services like Travis-CI, CircleCI, and Appveyor to do this for us. We can specify multiple environments to run the project tests.

Other than that, we can perform all the checks during the builds like all the tests pass, style guidelines are followed, etc. The idea is to fail the build if any discrepancy is found, and to merge the code only when everything is green.

Adding Travis to our pipeline

Adding Travis to our project is pretty straightforward, we have to add a configuration file called travis.yml in our repository, and grant authorization to Travis to access our repository. The process is briefed here at their official website, and you can find the basics here.

# File: travis.yml
language: python

os:
    - linux

python:
    - 2.7
    - 3.4
    - 3.6

install:
  - pip install -r requirements.txt
  - pip install -r test-requirements.txt

script:
  - pytest

The above configurations tells Travis to execute our python project build on linux operating system, with versions 2.7, 3.4 and 3.6. The commands in install phase are executed before the commands in script phase which is where our project building/testing commands are written.

Ideally, testing the build for the python version and operating system of the production server is sufficient, but it doesn't harm to test for other platform configurations since the developers might have different versions in local and we would like to avoid version specific issues anyway.

Integrating Codecov

Services like Codecov provide tools to measure coverage, generate reports, and take actions around it. To our existing .travis.yml we have added the command to install codecov in the install pahse, and the codecov command in the after_success phase.

language: python

os:
    - linux

python:
    - 2.7
    - 3.4
    - 3.6

install:
  - pip install -r requirements.txt
  - pip install -r test-requirements.txt
  - pip install codecov

script:
  - pytest --cov=autodeploy --cov-report term-missing

after_success:
  - codecov

Additionally, we can also define a codecov.yml file to configure codcov features.

Github integrations in action

Here's a Pull Request with both codecov and Travis integrated. You can see how a nice coverage report is commented below, and the green checks are saying that everything succeeded. There's also functionality to configure notifications, so that you're notified when your build fails/succeeds.

Screen-Shot-2019-02-11-at-1-11-57-AM.png

Screen-Shot-2019-02-11-at-1-12-18-AM.png

Deploying the code

Now that the code has been merged and everything looks fine, it's time to deploy our code and make the REST API live. The manual process for it can be logging into the server, cloning the service, and running some commands to get the production code serving. Let's see if we can do better.

Deploying using Server-side git-hooks

We can use post-receive hook to deploy the code in the master branch to production, to do that we first need to initialize a bare git repository at the server. Why a bare git repository? Because we don't intend to work inside this remote repository and hence we don't need any working tree to track stuff.

$ git init --bare ~/project_git
$ cd project_git/hooks

Here's how our post-receive hook looks like

#!/bin/bash
# File: post-receive
PROJECT_DIR="/home/project"
GIT_DIR="/home/git"
BRANCH="master"

while read oldrev newrev ref
do
    if [ "$ref" = "refs/heads/$BRANCH" ];
    then
        echo "Ref $ref received. Deploying ${BRANCH} branch to production..."
        git --git-dir=$GIT_DIR checkout -f $BRANCH
        cd $PROJECT_DIR && make setup prod-run && sh healthcheck.sh
    else
        echo "Ref $ref received. Doing nothing: only the ${BRANCH} branch may be deployed on this server."
    fi
done

The script will deploy our code if the master branch is pushed to the server remote. We've transferred a few commands into a couple of make tasks to keep things more organized, here are the contents of our Makefile.

#File: Makefile
PROJECT_NAME=autodeploy

setup:
    pip3 install --user -r requirements.txt

prod-run:
    cd $(PROJECT_NAME) && gunicorn --bind 127.0.0.1:5000 app:app 

dev-run:
    export FLASK_ENV=development
    export FLASK_APP=app.py
    cd $(PROJECT_NAME) && flask run

We have also added a simple healthcheck script that makes sure the server is up and listening to the port. Here's its content,

#!/bin/bash
# File: healthcheck.sh
URL=127.0.0.1
PORT=5000
if nc -vz $URL $PORT; then
    echo "Everything is fine!"
else
    echo "Something went wrong :("
    exit 1
fi

Let's deploy our code now, and see if it works.

$ git remote add production server-user@server-dns.com:project_git
$ git push production master

Doing this, will deploy our code on the remote machine, and starts the server in production environment.

Here's the final code, and our final directory structure

├── Makefile
├── autodevelop
│   ├── __init__.py
│   └── app.py
├── hooks
│   ├── post-receive
│   ├── pre-commit
│   └── pre-push
├── requirements.txt
├── test-requirements.txt
└── tests
    ├── __init__.py
    └── test_app.py

Deploying using CI services

Some CI services do also provide integrations to different cloud platforms like Heroku, AWS, etc. The idea is simple, if the build succeeds, deploy the code to the server. This feature is pretty useful for staging environments. For example, let's say you made some front-end changes in your PR, the CI will take care of deploying these changes to a staging server so that the reviewer can visit it see the changes live.

Deploying using Github webhooks

What if we want our code to be deployed to the production once the Pull Request is merged to the master branch? Well, that's possible too! Just like git has hooks, Github also supports webhooks for various events (including Pull Request merge). The way it works is we'll have to set up a listener endpoint on our server which will listen to the interested events sent the by GitHub and take actions accordingly.

Conclusion

The post barely scratches the surface of different ways of automating your development workflow for better productivity. There are also more subtle workflows like the Git flow workflow, and the Forking workflow, used usually by large projects with a release cycle and open source projects. However, the concepts discussed and solutions implemented will apply to all of these workflows; you just need to figure out what you need and what works the best for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment