devjack/holiday-releases.md

## holiday-releases.md

      
    Raw
  

              holiday-releases.md
            
          
  title
  tags
  categories
  date
  
  
  How to release during the holidays
  
  
  software engineering
  
  
  blog
  
  
  2017-12-22
  
  
And so we reach the end of another coding-year and while many of us are plotting our holiday hacks and offline AR (actual reality) adventures there are a few folks who still have to launch features over the break.
Please don't.
Ok, so now you're charging ahead anyway, without your whole team on board, it's a period of hightend risk and its important that you approach production changes with an appropriate level of caution and care.
Check with suppliers

This is probably the most common cause of panic I've experienced both as an engineer and as one of those suppliers.
If you're launching something while your team is on holiday, chances are your partners, suppliers and 3rd parties are probably also in a similar situation. If you haven't already, reach out to them and know exactly what their support commitment and availability is. Know the escalation paths available as well, just in case.
Often suppliers (our team included) have alternative support arangements during holiday periods; not knowing this will slow down you ability to get help should you need it.
Are other teams informed?

Unless you're a solo team (chances are you're not) then do the other parts of the business know that changes may be happening? If you're launching a new feature, have you let the social, support and press teams know (because it if goes wrong they're bound to be the first responders).
Do you know how to escalate incidents with other teams? If you depend on an internal infrastructure team can you raise an incident? If you consume an API do you know how to contact the team managing it? While your changes might not to be to their code you're still one part of a complex technical ecosystem, so play nice.
If, for example, your new feature calls an internal API twice as often then initial tests might look like their service can handle the increased capacity you're expecting of them. Internall however, their queue might be backing up, internal logs filling up disks and other unexpected capacity plannign issues might come into plau. Please don't be that team that causes someone elses pager to go off.
How reliable and reproducable is your build system?

This speaks to two key indicators of a good build pipeline - test coverage and having reproduceable builds.
In the first instance, knowing your test processes (automated and exploritory) goes a long way to building confindence in holiday release cycles. It's not just a raw metric for test coverage, but also the smoke testing, fuzzing etc. that contribute to having a reliable build that you trust.
A reproduceable build however, also gives you confidence that you have a known and reproducable checkpoint in your code.
In the end, your build system is your safety net to making reliable software regardles of the increased end of year risk. If your builds are flakey, my advice is to not risk it.
Are you making additive only changes to databases?

It might seem small but databases are complex and anyone telling you otherwise has clearly never had a data migration cause trouble in production... especially when the experts in your team are on a 14 hour flights over the pacific.
Additive-only schema changes are a fantastic way to ensure that changes for your new feature cannot impact existing code dependant on a given schema.
By only ever rolling forward on your database you'll be confident that your always-new updates to the schema are incredibly unlikely to cause issues with other features.
Is rollback part of standard practice?

I don't just mean backups and restoration. The maturity of your change management is a crucial factor in knowing the risk and impact of having to rollback.
Should you require a backup restoration, how old is that backup?
Ideally, rollback should be automated (even if it is manually triggered) but making it part of standard practice to know how to rollback and recover from a bad release is crucial for releases confidence.
I know very few teams that can release like Homer: build goes forward, build goes back, build goes forward, build goes back.
Can you control customer experience?

Chances are you'll have a success metric riding on this important holiday work so separating the client experience from your zero-downtime-deploy is only logical.
When you do finally deploy code to a production environment, is it immediately visable to customers or do you have a feature flagging or gating mechanism to control client experience? If you're launching a new UI can customers opt-in or opt-out? If a feature is about to start behaving differently can clients opt back in to the old behaviour if needed?
Having feature release independant of code changes is an excellent idea on many levels, but over the holiday break its about separation of risk and change coordination.
Ok, so this is just a starting point and much of it is just standard best-practice engineering with a hint of caution. If are charging ahead with a holiday release in the next two weeks then I wish you (and your colleagues and your suppliers, and your users) good luck! Hopefully the above points help you do some proactive planning so you're ready for whatever night happen.