Skip to content

Instantly share code, notes, and snippets.

@technolo-g
Last active September 27, 2022 20:56
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save technolo-g/8be05dc0c7d3fb94c39d92ce3e4c5418 to your computer and use it in GitHub Desktop.
Save technolo-g/8be05dc0c7d3fb94c39d92ce3e4c5418 to your computer and use it in GitHub Desktop.
Problem Solving Framework
  • What is the problem?

  • What exactly does this problem mean? Spend 5 minutes of quiet time thinking to yourself the implications of what is happening and why it may be happening. Are there recent changes going on? Did someone mention this at standup?

  • Does it happen in prod too? (run the build right now, etc.)

  • Does it happen consistently or only sometimes?

  • If not consistent, what seems to be the variability? (build slave, environment)

  • Does the documentation of the project mention anything like this?

  • What systems could potentially be involved in this issue? List them here. (artifactory, docker hosts, SUVs etc..)

  • What do the logs of everything involved say?

  • What exactly is causing the problem (in natural language)?

  • What exactly is the root error/cause of the problem (the error lines from the build/command/site/etc)?

  • Why is this particular log entry/line of output/message the cause?

  • What system(s) is causing the error (client, server, some hosted service)?

  • What does that error mean?

  • What causes that error? (google)

  • What causes that error? (teammate)

  • What causes that error? (external team)

  • What are the potential resolutions?

  • Which solution is simplest that will solve the immediate problem?

  • Which solution is simplest that will solve this problem for now and the future?

  • Take some more quiet time to think through the implications of the solution you have identified. How do you make the change safely? Do you need to notify anyone of the decision(s) made? Will there be changes in the team(s) process/behavior after implementing solution?

  • Based on your assessment of the chosen solution's impact, what is your communication plan to best convey the information to the people that need to know about it?

  • When trying this solution, does it work?

  • If not, switch to 4 wheel low: If at this point there is still no clear solution, the next step is to begin to bisect the problem and eliminate any non-essential information from the problem context. Start pulling things out of the process to really narrow down exactly where it is happening. This can be difficult and it can also be very domain-specific.

  • In domain specific problems, it is best to use the tools of that domain. ie: don't println if there is a tool that can give you the exact information you're looking for like a debugger. When we get to this point, it is ok and encouraged to work with someone familiar with the domain. Find an expert, request time from that person in an asynchronous way, and pair when the person has time. Avoid walking up and interrupting people unless it is an emergency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment