Skip to content

Instantly share code, notes, and snippets.

@panasenco
Last active October 20, 2022 16:31
Show Gist options
  • Save panasenco/4a08b941ecfd959352b1117233825b41 to your computer and use it in GitHub Desktop.
Save panasenco/4a08b941ecfd959352b1117233825b41 to your computer and use it in GitHub Desktop.
Data modeling resources

ROI Analysis

  • A/B testing
    • Ron Kohavi's Trustworthy Online Controlled Experiments:

      Features are built because teams believe they are useful, yet in many domains most ideas fail to improve key metrics. Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve (Kohavi, Crook and Longbotham 2009). Success is even harder to find in well-optimized domains like Bing and Google, whereby some measures’ success rate is about 10–20% (Manzi 2012).

      Fareed Mosavat, Slack’s Director of Product and Lifecycle tweeted that with all of Slack’s experience, only about 30% of monetization experiments show positive results; “if you are on an experiment-driven team, get used to, at best, 70% of your work being thrown away. Build your processes accordingly” (Mosavat 2019).

      Avinash Kaushik wrote in his Experimentation and Testing primer (Kaushik 2006) that “80% of the time you/we are wrong about what a customer wants.” Mike Moran (Moran 2007, 240) wrote that Netflix considers 90% of what they try to be wrong. Regis Hadiaris from Quicken Loans wrote that “in the five years I’ve been running tests, I’m only about as correct in guessing the results as a major league baseball player is in hitting the ball. That’s right – I’ve been doing this for 5 years, and I can only ‘guess’ the outcome of a test about 33% of the time!” (Moran 2008). Dan McKinley at Etsy (McKinley 2013) wrote “nearly everything fails” and for features, he wrote “it’s been humbling to realize how rare it is for them to succeed on the first attempt. I strongly suspect that this experience is universal, but it is not universally recognized or acknowledged.” Finally, Colin McFarland wrote in the book Experiment! (McFarland 2012, 20) “No matter how much you think it’s a no-brainer, how much research you’ve done, or how many competitors are doing it, sometimes, more often than you might think, experiment ideas simply fail.”

      Not every domain has such poor statistics, but most who have run controlled experiments in customer-facing websites and applications have experienced this humbling reality: we are poor at assessing the value of ideas.

    • Georgi Georgiev's What Can Be Learned From 1,001 A/B Tests?

Kimball Modeling

Non-Kimball Modeling

  • Narrative modeling
  • Anchor modeling
    • Presentation
    • Paper
    • Personal impressions: Looks very impressive, but is quite confusing. I don't fully understand it, but it appears that it'd be very difficult to train data analysts in using this.
  • Activity schema
    • Spec
    • Personal impressions: I really wanted to like Activity Schema, and spent a lot of time trying to figure it out. I initially thought the columns in the schema were examples, and only after a while realized "no, the author literally wants me to name my columns 'customer', 'feature_1', 'feature_2', and 'feature_3'. It's completely inflexible and confusing. My takeaway is that Activity Schema is not meant to be written or read by humans, but only passed through automated tools like Narrator.
  • Data mesh
    • Personal impressions: Nothing actionable here unless you're the CIO/CDO. Data architects are lucky if they can influence the technology stack, much less how teams are organized, structured, and funded. Also, the idea of forcing teams to produce/consume data exclusively through contracts sounds like it'd have a ton of overhead and further exacerbate the very deivides that data warehousing was trying to solve.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment