Skip to content

Instantly share code, notes, and snippets.

@zelaznik
Last active July 18, 2024 03:01
Show Gist options
  • Save zelaznik/88ff765cff306b5e4b0059c5bbc655d1 to your computer and use it in GitHub Desktop.
Save zelaznik/88ff765cff306b5e4b0059c5bbc655d1 to your computer and use it in GitHub Desktop.
Just Enough Probability and Stats For Software Engineers

Comparing Statistics To Test Driven Development:

I’m assuming a lot of people in the audience haven’t studied statistics, but because this is Rubyconf, plenty of you know the principles of test-driven-development (TDD). If you haven’t studied statistics before, it follows the same principle as TDD.

In TDD, you demonstrate that your code is correct in two steps. First, assume your code is wrong. Second, try to disprove that assumption. The first step is when you write the test so that it fails. The second step is to change your application code so that the test passes.

In statistics we do the same thing. We first assume the opposite of what we want to prove. If we want to show that a drug treats a disease, we first assume that this drug has no effect. That’s what the placebo group is for. The placebo group is the “red” portion of “red-green refactoring.” The group that’s treated with the drug is (hopefully) the “green” portion of “red-green” factoring.

A statistical test will never PROVE that the drug works, just like a passing test doesn’t PROVE that your code works. Both are tools to give you more confidence.

Overfitting in stats and in TDD:

Let's say I want to write a function that returns the absolute value of a number:

def abs(v)
  if v == 2
    1
  elsif v == 1
    1
  elsif v == 0
    0
  elsif v == -1
    1
  elsif v == -2
    2
  end
end

And then I wrote my tests in rspec:

describe "abs" do
  it "returns 2 when given 2" { expect(abs(2)).to eq(2) }
  it "returns 1 when given 1" { expect(abs(1)).to eq(1) }
  it "returns 0 when given 0" { expect(abs(0)).to eq(0) }
  it "returns 1 when given -1" { expect(abs(-1)).to eq(1) }
  it "returns 2 when given -2" { expect(abs(-2)).to eq(2) }
end

So what's wrong with this? You wrote your specs, and you implemented the function to make your specs pass? This is the equivalent of overfitting in statistics.

Talk Outline: Probability and Stats for Software Engineers

Introduction

  • Title: Probability and Stats for Software Engineers
  • Objective: Help software engineers understand and apply basic statistical concepts to improve their diagnostics and decision-making.
  • Relevance: Examples of everyday software engineering problems where probability and statistics can help (e.g., flaky tests, error rates).

Preface:

Some jargon:

  • Null hypothesis
  • Confidence interal (95% by arbitrary convention)
  • Rejecting the null
  • Failing to reject the null
  • Type 1 error vs Type 2 error

Examples of said jargon:

  • Try to fix the flaky test
  • Null hypothesis "I haven't fixed my flaky test"
  • Type 1 error: "Thinking I fixed the test when I haven't"
  • Type 2 error: "Actually fixing the test but not confident I did"
  • Confidence interal:
  • Run enough tests to that if I haven't fixed the flaky test, there's a 95% chance that at least one of those builds will be red
  • How do we know how many times to run the test suite? (That's what this talk is for)

Part 0: Comparing Stats and Test Driven Development (No, really!)

Part 1: Binary Questions in Software Engineering

Scenario 1: Flaky Tests

  • Problem: Is the test genuinely flaky or not?
  • Concept: Bernoulli distribution
  • Explanation: Simple probability - it either fails (p) or passes (1-p).
  • Calculation: Probability of no failures (or one or more failures) over multiple runs.

Scenario 2: Error Rate Monitoring

  • Problem: Is the spike in errors a sign of a new bug or just random chance?
  • Concept: Poisson distribution for rare events. Show visual demonstration that Poisson is a good approximation of binomial for rare events.
  • Explanation: Model the expected number of errors over time.
  • Calculation: Probability of zero errors (or one or more errors) given the historical average.

Part 2: Central Limit Theorem (CLT)

Introduction to CLT

  • Concept: Regardless of the original distribution, the distribution of the sample means will tend to be normal (bell curve) if the sample size is large enough.
  • Importance: Simplifies many real-world problems since the normal distribution is easy to work with (mean and standard deviation).

Interactive Demonstration:

  • Website Tool: Allow users to select different probability distributions and run multiple trials.
  • Visualization: Show the cumulative results forming a bell curve.
  • Explanation: Demonstrate how the sample mean approximates a normal distribution as the number of trials increases.

Application of CLT:

Example: Error rates before and after a fix.

  • Calculation: Using the mean and standard deviation to estimate the likelihood of reduced error rates. Use Poisson distribution from previous section to give the mean and standard deviation.
  • Interpretation: Quick sanity checks and ballpark estimates using normal distribution properties.

Example: Load Capacity Of Websites:

  • Use average use time to get a poisson mean and standard deviation
  • Use CLT to make a bell curve. Then calculate how much capacity is required for 99% uptime, 99.9% uptime, etc

Example (if time permits): queueing theory

  • Useful if the server is keeping a websocket open for each connection
  • There's also a normal approximation for this, details to be given later

Example: A/B Testing:

  • Click-through rates follows a bernoulli distribution.
  • From this we can get a normal approximation for group A, and a normal approximation for group B
  • The hypothesis test is the A - B > 0, and (A-B) has a normal distribution of mu-a + mu-b, sqrt(sigma-a**2 + sigma-b**2)

Part 3: Practical Applications and Caveats

Quick Calculations:

  • Mean and Standard Deviation: How to calculate and use them for back-of-the-hand estimates.
  • Binary Questions: Apply the bell curve to make decisions (e.g., Is the error rate significantly reduced?).

Caveats:

  • Limitations: These methods are useful for ballpark estimates and sanity checks but not for rigorous statistical analysis.

  • Real-world Use: Emphasize that software engineers are not statisticians, and these tools are meant for practical, everyday use.

  • Knowing when events are and aren't independent:

    • Users clicking the send button multiple times when they should click once (independent)
    • A power outage at your data center takes down your database for lots of users (NOT independent)
    • Two separate runs of a flaky test (independent)

Conclusion

  • Summary: Recap the key concepts (Bernoulli distribution for binary questions, Poisson distribution for rare events, and CLT for simplifying complex problems).
  • Q&A: Open the floor for questions, prepared to delve deeper into any of the topics discussed.
  • Resources: Provide links to further reading and tools (e.g., interactive website, basic stats textbooks, online courses).`
@taylorkearns
Copy link

This is great! I think it stands a very good chance of getting accepted. The trick will be making it digestible to non-statisticians (which I am sure you can do) and making it short enough, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment