You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm cobbling this together as a very rough basis for coding conventions and best practices across different languages.
As much as possible, I'll attempt to defer the guidance to something that is kept up-to-date and supplied by a reliable third party, e.g. "Use Rails and Rubocop"
Sections
I will attempt to break this down per-language, with a distinct section for general recommendations that span languages.
I may also later add a section with some opinionated recommendations that I've seen used to good effect.
There are a lot of best practices that aren't language specific and are applicable regardless of your stack. Some of these will also be a little more process-oriented as that can directly impact productivity. Some of these recommendations also depend on the size of your product. Smaller, simpler products may find some recommendations to exceed cost vs. benefit (e.g. writing an elaborate OpenAPI spec for a basic RESTful CRUD API).
Sections are as follows:
Use Feature toggles - Feature toggles/flags are described along with some best practices and pitfalls.
Use CI/CD - CI/CD (Continuous Integration, Continuous Delivery) benefits and best practices are laid out in the accompanying gist.
If you ever read the explanation for gitflow https://nvie.com/posts/a-successful-git-branching-model/ you might be left scratching your head. It's rather elaborate, because it's intended to account for all the different types of branches you might need to account for. If you have a versioned library (e.g. "Ruby on Rails", "React") this is still a fairly practical approach.
However, for web applications, the app is always live, and the user is never given a choice to select between 1.0 and 2.0 - they simply get whatever we serve them. If you have an architecture that has feature toggles built into its architecture, there's nothing stopping you from deploying all the 2.0 features but keeping the UI on 1.0 - you can turn on the feature whenever you want.
For this reason, there is also little benefit in keeping a ton of branches and contingencies around when we can simplify gitflow to 2 branches:
main
my-feature
Now instead of trying to keep this image firmly memorized in your head, you have a much simpler workflow in which you always branch off master and you always merge back into master. If you have a feature that should not be release yet, you wrap your code in a feature toggle.
The RARE edge case for this is when you are doing a huge framework upgrade (e.g. Ruby on Rails 5.x to 6.x) which may require its own long-lived branch until it goes live. I still strongly recommend keeping the branch lifetime as short as possible. It's harder to do a framework upgrade when people continue writing code that you constantly have to upgrade to the new version. Sometimes it is better to temporarily put features on hold to finish the upgrade.
Other git best practices
The following are my opinions, but I'll explain why I think they're good ideas:
I recommend turning on "branch protection rules" and enabling the following:
Require pull request reviews before merging
Require status checks to pass before merging
Disable merge & rebase merging (in favor of squash merges)
No manual changes in higher environments
There are several reasons not to make manual changes in higher environments:
You might break something.
You will cause drift.
There is no audit trail of your changes
Even when you are doing something like "emptying an s3 bucket" it can be prudent to put it somewhere in source control (i.e. a /production-ops repo) which can then be linked to a JIRA ticket for reference. That way if you find out a week later that you emptied the wrong bucket, you'll at least know what you did and when you did it.
Everything in version control
Whenever possible, everything should be in source control. There are a few exceptions:
Secret values or encryption keys
Steps that cannot be automated (e.g. initial account creation)
Actual production data (for dev & QA you might seed data or start empty)
To provide a general idea, this includes but is not limited to:
Application code
Database schemas
Database lookup/reference data (e.g. states, countries, user types, etc.)
Application deployment behavior (e.g. Jenkinsfile or buildspec.yml)
Configuration Management (Chef, Puppet, Ansible, custom userdata or shell)
Tests (of all kinds)
Code linting configuration
Database version control and automation
While I already mentioned "everything in version control" in the prior section, people often forget about the database and continue to run unversioned scripts manually againt their production database.
Depending on language, different tools or ORMs are applicable:
Java/Agnostic - Flyway or Liquibase
Ruby - Rails ActiveRecord
.NET - Entity Framework
etc.
Consistent linting
You should have some sort of baseline linting established regardless of the language used. Each language has their own linters and conventions, which I will mention per-language.
https://editorconfig.org/ is also great for documenting/enforcing consistency, though most languages will have linters that overlap with this behavior. It can be helpful when you have several languages in a single repository, or when you have custom file extensions.
Microservice vs. Monolith vs. Macroservice
These are some popular buzzwords regarding the size of an application/service:
Microservice: A very small application that does 1 thing
Monolith: A very large application that does everything
Macroservice: A right-sized application that does what makes sense for its domain
As you'll have noticed, microservice and monolith are easier to define. It's either ALL or ONE. However, in practice this often doesn't result in an ideal architecture. Either everything is shoved into one big ball of mud (the monolith) or you break everything into small pieces that still end up tightly coupled (the microservice).
A better practice is to think about the domain and consider whether or not there is tangible value in grouping features together or breaking them apart. A few trivial examples:
Logging - Microservice - Every app needs to log information, and whether it is sending logs directly to a service, or having a microservice that manages logs, it's a fairly discrete cross-cutting concern. If you use something like https://prometheus.io/ that would also fit into this general bucket.
Simple web application - Monolith - If you have a basic web application that does CRUD operations, data visualizations, etc. and it only interacts with 3rd party applications, you probably don't need to break it into several "microservices" that are all in fact dependent on one another.
Here are some examples where you might consider breaking things into multiple parts:
Web app with public API
Public API - You might build an API to expose all or some of the features of your system so that other 3rd party apps can consume it.
Web app - Your front-end will likely offer additional features based on the UI
Multiple domains (Employee Management System)
Employee mailing lists - A system for adding an employee to mailing lists
Employee accounting - One system for paying employees, etc.
Employee permissions - A system on top of Active Directory that manages employee permissions across several systems
In the latter situation, we can see how a simple system that might have started with adding employees to mailings lists could have eventually evolved into accounting and full-blown permissions management. These things could technically all fit into one giant monolith, but it's likely there would be a lot of complexity unique to each domain, and there is benefit in splitting them into distinct applications. And while the concept of "employee" spans systems, their actual features are entirely unique.
Architecture differences based on deployment landscape
Where you deploy the app invariably changes how you code the app. Before deciding on one or the other, you should consider your feature and decide which deployment style is most suitable.
Serverless:
No need for connection pooling (or other stateful considerations)
No need for a full web framework if API Gateway is doing all the lifting
Can introduce a lot of complexity if there are several distinct lambdas with dependencies
No guarantee of "warm" containers if you are doing slow-startup operations
Step Functions:
Can chain together a series of lambda functions in a coherent manner
Good for "documenting" complex behavior that ties together several functions
The definition file can get fairly elaborate
Publish/Subscribe (pub/sub)
Everything can effectively be "fire-and-forget"
Harder to test end-to-end since it inevitably contains several discrete parts
Easy to test parts in isolation
Moving parts can be anything (container, lambda, ec2) - the only common point of interaction is the queue
Traditional containers / EC2
Needs to use connection pooling and other stateful tooling to avoid performance issues
Needs a fully-featured web framework (usually)
Requires various hardening and other OS configuration
Handling concurrent user transactions
It is important to ask this question early on so you can decide on a solution:
You don't need concurrent user writes
You do need concurrent user writes and last-write-wins (easy)
You do need concurrent user writes and safe changesets (difficult)
And to briefly illustrate the difference between last-write-wins (LWW) and changesets:
In the above scenario, User 2 overwrites whatever User 1 did. With changesets, there are a couple applicable situations:
Situation 1: Safe concurrency. Two users can modify distinct fields without collision.
Record: { "name": "me", "age": 50 }
User 1: changeset -> { "name": "me1" }
User 2: changeset -> { "age": 60 }
Result: { "name": "me1", "age": 60 }
Situation 2: Unsafe concurrency. Two users cannot modify the same field.
Record: { "name": "me", "age": 50 }
User 1: changeset -> { "name": "me1" }
User 2: changeset -> { "name": "me2" } ERROR! NOT ALLOWED!
Result: { "name": "me1", "age": 50 }
In the latter situation, you generally include a record SHA and you can determine whether or not two users are concurrently modifying the same field. If only a single user is modifying the field, they can safely make changes.
Break features into bite-size portions
Every feature should be broken down into 1-2 day efforts that any person on the development team can do. The feature/story should be sufficiently clear that it can be developed immediately.
This practice also applies to OS images! - From both an app development and security standpoint, your OS images should be getting patched at least monthly.
Before diving into the different recommended practices, I think it's important to help define what each testing type implies, and how you can define boundaries between those tests.
I'll briefly describe how I identify each type of test:
Unit: Spans a single component. Usually an atomic piece of work. Touch points are often stubbed. Examples: Get a user from a database or external service; Map some input JSON into a different output JSON.
Integration: Spans a high level action which could span several tiers in an application. Examples: Retrieve a list of users. Update user profile information.
End-to-end: Spans a feature or use case across one or more services. Generally tested from the perspective of an end-user.
Which tests are most important?
My personal belief is that you should have a "Testing Diamond" (or "trophy"):
(footnote: static in the above "trophy" represents static typing & linting)
I believe your priority should be:
Integration
e2e
unit
And I'll explain why:
Integration
Easy to write, and can test almost every aspect of your application. It gives you maximum value with minimal effort. Let's imagine you have a GET /users/1 endpoint. This represents a high level function of your application (though perhaps one level below a legitimate "feature"). Despite its simplicity to the end-user, it may have a lot under the hood:
Controller takes HTTP input and invokes a service
Service invokes ORM model
ORM model makes a DB query
Service makes external REST API call
Service aggregates local and external data
Controller transforms data into desired JSON shape
We can see that there is a lot of underlying complexity in this single API request, and a lot of wiring between the model, service, external API call, and various JSON mapping. If you relied heavily on unit tests, it's possible that all your unit tests are passing but the controller is still breaking or returning a malformed result.
By writing a very basic integration test (pseudocode) we can easily check whether or not the whole stack behaved:
get "/users/1"
assert response.ok?
assert JSON.parse(response.body).contains?('id')
In my opinion, this 1 integration test delivers more value than 10 unit tests which each test their respective component in isolation. It's also much easier to maintain because there are fewer of them, and it is at a much higher level, so you aren't gettting hung up on minute implementation details.
Here are some best practices for integration tests:
Only stub things beyond your system boundary (e.g. external APIs or external databases). Everything else can be invoked without stubbing.
Do stub things beyond your system boundary. You generally don't want your unit test suite to be dependent on another live application.
Do not stub the database if it is tied to your application. Use a local copy to run tests against. (preferably the same engine, as sqlite can vary in features compared to a real RDBMS)
They should be very fast
End-to-End (e2e)
Second-most important are end-to-end tests, for 2 reasons:
These represent actual use cases
These can test across system boundaries
Integration tests are excellent, but you don't want them bleeding across system boundaries, and they often can't easily represent a use case from the perspective of a user (e.g. view this list and click this button, etc.)
There are a lot of tools for end-to-end testing:
https://www.cypress.io/ - I highly recommend this tool over competitors. I think it excels in many ways.
We finally come to unit tests. Historically we've been told that unit tests are the best kind of tests, but in practice their value is often underwhelming. However, I think this can be attributed to the fact that we're unit testing the wrong things and testing them in a way that creates more cost than benefit.
In a worst case scenario:
You write a unit test for your "service" object
The test passes
The service breaks in production
You fix the bug in your service
Your tests are broken and require a rewrite
Your tests finally pass
Another bug happens in production, but was caused by an underlying model
You update your test stubs
etc. etc. for eternity
In the above situation, we're writing a worthless test that passes when it should fail, and fails when it should pass. This type of testing adds no benefit, but has a substantial cost, especially when it's used in a "service" or location where there is a lot of orchestration and you have to write a lot of stubs or do a lot of data setup.
In a best case scenario:
You write a unit test for your "service" object
The test fails
You fix a bug in the service
Everything is perfect in production
The latter sounds like a nice best-case scenario, but I will contest that you would have already caught this bug with an integration test (and therefore testing the service in isolation was a waste of your time).
What about TDD?
In my opinion, the underlying premise of TDD is to define the system behavior before you code it. One of the bigger problems with writing bad unit tests is that people write code, and then write tests that will pass. Instead of thinking about the feature or how the code might fail, they are just writing tests for the sake of writing tests.
If you are a fan of the TDD approach, I suggest writing your test at the highest possible level (e.g. an integration test that asserts a 200) and then slowly fleshing out the details and writing code as you go along.
Do not write too many unit tests
I think this point is worth highlighting. Unit tests are valuable for components that are easily isolated and are typically very input/output-oriented.
Some good unit test candidates:
Mappers - Usually a mapper takes an input object/JSON/XML and transforms it into a desired output. Regardless of where you use it, this will always be true, so it's a good candidate for effective unit tests.
Global functions/modules - Often, if you are exporting a global function (python) or crafting a reusable module (Ruby) these are things that may be used in a polymorphic manner, and could be applied in multiple ways. When a bug occurs in these locations, it may show up erratically in its concrete usage, so testing it in isolation is often valuable.
Some bad unit test candidates:
ORM model / repository - I generally avoid testing these, because you are ultimately just testing the framework's ORM, which we already know works, and is already tested in the framework itself. It is better to test your desired behavior in an integration test (e.g. "get all admin users")
Logic-heavy service or orchestration - In the integration test example above, I explained a sevice that made multiple API calls, did mapping, and composed a result to give the controller. If we stubbed every touch point, there's actually very little of substance being tested, and we will have to make sure all the stubs stay up-to-date with their real implementation. This is brittle an ineffectual.
Do I need complete code coverage?
100% code coverage is a nice-to-have, but there are a lot of "edge cases" that will reduce your coverage:
You should try to hit roughly 80% code coverage, though it isn't a perfect measure.
More importantly, you should be concerned with Scenario coverage. Let's look at this example code:
def divide(a, b)
a / b
end
# 100% code coverage! We're all good, right!?
def test_divide
result = divide(4,2)
assert result == 2
end
In the above example, you can see we have 100% code coverage, but we do not have 100% scenario coverage. (hint: We forgot to define what should happen when we divide by zero)
Tests are important
Some people ascribe to the school of thought that "you don't need tests if you have monitoring". I think this concept is flawed for several reasons:
In many ways, tests document the system. This is especially the case when you have code that can take several permutations of inputs. Dates, math, and date math are prime examples.
Merely the exercise of thinking through your unit/integration tests can reveal flaws in your understanding
It's better for code to break in a pipeline than in production
Writing effective tests
Given / When / Then
Regardless of the type of test you are writing, it should always have the following:
Given (Arrange): Pre-requisites for your scenario. e.g. Given I am a user in the system
When (Act): The feature under test. e.g. When I enter my username/password and login
Then (Assert): The verification of expected behavior. e.g. Then I should be successfully logged into the system
I prefer "Given/When/Then" since it transforms more nicely into plain language (and thus can match your user story). But some people prefer to call this "Arrange/Act/Assert". They are effectively the same construct.
Use one framework consistently
As much as possible, you should use 1 test framework within a project and make all of your tests look as consistent as possible.
To briefly illustrate, this would be a bad folder structure:
# no coherent pattern to file names
/test
test_service.rb
model_test.rb
verify.rb
This would be a bad file structure:
# In this example we're writing 3 very different styles of tests
def test_add
# blah
end
def given_this_when_something_then_that
# blah
end
def test_scenario_2
describe "given this" do
it "then that" do
end
end
end
Isolate DB changes
Before or after each test run, you should make sure you teardown your database. Ideally, you are also doing cleanup after each test case so that your data isn't bleeding into another scenario.
Run tests in random order
It is important that your tests do not unintentionally end up order-dependent. This can often happen when you have elaborate setup/teardown and eventually you write a test that was accidentally dependent on another test's setup.
Performance Testing
I won't say much on performance testing other than the fact that you should figure out some sort of baseline performance and use a tool like https://www.blazemeter.com/ to validate that performance.
Contract Testing
Broadly speaking, contract testing is a way to ensure that your API remains stable with the way your clients are using it. You will get the most value from it if you have the following situation:
You have a handful of microservices, and each of those could be consumed by multiple other microservices
You control each microservice, or you have a cooperative 3rd party
For most projects I recommend leveraging your e2e tests to help you cover this type of testing, and as it evolves, decide on which of the following tools might help you keep your APIs stable as the product grows.
There are a few different "mechanisms" that I consider under the "contract testing" umbrella:
https://docs.pact.io/ - A fully-feature test framework that allows you to develop tests that will ensure your provider doesn't break your clients, and that your clients don't get broken by a provider.
https://www.openapis.org/ or https://graphql.org/ - With OpenAPI, you generally don't have built-in automation, but it provides you with means of documentation and a human-interactive UI in which an external developer could theoretically validate.
https://json-schema.org/ - With JSON Schema (or XML schema) you can define your expected input/outputs such that you don't end up with serialization blunders on either end. This is particularly valuable with pub/sub architectures.
End-to-end "smoke" tests - In lieu of any of the above mechanisms, you can also use an end-to-end test as your means of "contract testing". You can either do this implicitly (your feature tests pass, therefore the contract must be working) or explicitly (hit their API with a particular payload and validate a particular response)
End-to-end tests give you the most value with the least amount of effort. If you have a feature or use case that is dependent on a 3rd party library, you should have an end-to-end test for it, and it should implicitly determine whether or not the external API is still functioning. It is worth noting that this approach will not prevent the external API from publishing a breaking change, but it will help you catch it sooner, especially if you run end-to-end tests on a daily cadence.
JSON schema is helpful when you have complex data structures that may evolve, or an endpoint that potentially supports multiple different data structures. It is also very useful in pub/sub architectures where the data model may evolve, but you need to support older versions of data.
OpenAPI or GraphQL - With OpenAPI and GraphQL, you get user-readable documentation and generally some built-in API endpoints that can be hit by a developer or an automated tool.
Pact - Pact is a high-effort way to implement genuine contract tests, but can be most valuable in circumstances where you have a high number of clients using your application and you want to safely verify they are working before you deploy your product.
There are a few ways in which you can monitor your application:
Application Performance Monitoring (APM) - APM is generally most useful for keeping an eye on unhandled exceptions in your application. It's also good to get a birds' eye view of your application health at a glance.
Distributed tracing - Adding an additional point onto APM, if you have microservices, you will likely need distributed tracing either via your APM, logging, or an additional tool like X-Ray.
Logging - Whether it is HTTP traffic, warnings, or errors, you will likely have a lot of logs that need monitoring.
Infrastructure - Load balancers, container clusters, DNS, EC2 instances, etc. All of these exist outside the scope of your application but could impact it, so you would likely need a separate dashboard like Cloudwatch to observe infrastructure at a glance.
Custom metrics - In addition to the above (or in lieu of), you may want some sort of custom instrumentation like https://prometheus.io/
I STRONGLY recommend having one or more of these tools set up correctly prior to launching your app in production.
Newrelic and Datadog both have fairly robust tool suites that let you opt into which kinds of monitoring you need.
There are a lot of ways to organize various python apps, so I'll just lay out some recommendations I've found to be useful
Recommendations
Use python 3 - python 2 is now end of life, so don't use it.
Use virtual environments (or anaconda/miniconda) - pip will install things globally if you don't use a virtual environment, and this can lead to incompatible library mismatches.
Use a requirements.txt with explicit versions - You should specify a requirements.txt with explicit versions so developers know which version of the libraries are supposed to work with your app.
This was an extremely helpful section, especially the detailed explanation as to why integration tests provide a wider umbrella with less hassle. Thanks!
This was an extremely helpful section, especially the detailed explanation as to why integration tests provide a wider umbrella with less hassle. Thanks!