A recent issue was caused by a PR promoted to prod before the necessary Heroku Connect changes were made on prod (they were correctly made on test). It's #82 in the RCA log
Tim flagged that this has happened several times in the past (though not recorded in RCA log). Can we prevent it in future?
As per Joel Test - 2. Can you make a build in one step?, "If the process takes any more than one step, it is prone to errors".
Manually editing Heroku connect config -> deploy involves more than one step -> prone to errors.
We can reduce risk by:
Technical approach: eg. make HC changes work like DB migrations, etc.
Human approach: eg. introduce process to encourage / remind devs to not make this mistake.
a) halt deploy on config mismatch: On deploy, use psql to automatically compare heroku schema on prod to test. If there's a mismatch, halt the deploy.
Problems: prone to false negatives when test neccessarily deviates from prod. Hard to implement well. Probably not worth it cost-benefit wise.
b) make config changes work like migrations: Use importing config from CLI to write config changes as part of the pull request, then execute the config changes on deploy ala DB migrations.
Problems: Once the config is imported, you have to wait for rows to sync, which will be hard / impossible programmatically. Surfacing errors in config or on sync is easy through UI but will be hard / impossible programmatically.
c) any other ideas?
a) add "check Heroku Config" to PR templates:
Problems: Probably won't work since the PR template is checked at merge-time not deploy-time. Also, if we allow that level of granularity in PR templates we'll soon have a checklist of a dozen similar risks that devs just ignore.
b) always record even small Heroku Connect blips in RCA: Will just mean we're better equipped to assess patterns and address recurring issues further down the track.
c) any other ideas?
Or, we could do nothing and just "try to be more careful"?
We had a bad one on auth last year same reason.
I think only way is to always do backward-compatible migrations. At least compatible with last prev version. So sometimes extra migrations before and after to clean-up.