- A/B testing
- Ron Kohavi's Trustworthy Online Controlled Experiments:
Features are built because teams believe they are useful, yet in many domains most ideas fail to improve key metrics. Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve (Kohavi, Crook and Longbotham 2009). Success is even harder to find in well-optimized domains like Bing and Google, whereby some measures’ success rate is about 10–20% (Manzi 2012).
Fareed Mosavat, Slack’s Director of Product and Lifecycle tweeted that with all of Slack’s experience, only about 30% of monetization experiments show positive results; “if you are on an experiment-driven team, get used to, at best, 70% of your work being thrown away. Build your processes accordingly” (Mosavat 2019).
Avinash Kaushik wrote in his Experimentation and Testing primer (Kaushik 2006) that “80% of the time you/we are wrong about what a customer wants.” Mike Moran (Moran 2007, 240) wrote that Netflix considers 90% of what they try to be wrong. Regis Hadiaris from Quicken Loans wrote that “in the five years I’ve been running tests, I’m only about as correct in guessing the results as a major league baseball player is in hitting the ball. That’s right – I’ve been doing this for 5 years, and I can only ‘guess’ the outcome of a test about 33% of the time!” (Moran 2008). Dan McKinley at Etsy (McKinley 2013) wrote “nearly everything fails” and for features, he wrote “it’s been humbling to realize how rare it is for them to succeed on the first attempt. I strongly suspect that this experience is universal, but it is not universally recognized or acknowledged.” Finally, Colin McFarland wrote in the book Experiment! (McFarland 2012, 20) “No matter how much you think it’s a no-brainer, how much research you’ve done, or how many competitors are doing it, sometimes, more often than you might think, experiment ideas simply fail.”
Not every domain has such poor statistics, but most who have run controlled experiments in customer-facing websites and applications have experienced this humbling reality: we are poor at assessing the value of ideas.
- Georgi Georgiev's What Can Be Learned From 1,001 A/B Tests?
- Ron Kohavi's Trustworthy Online Controlled Experiments:
- Book recommended at Coalesce 2022: Star Schema: The Complete Reference
- Video series recommended in the comments: ETL Architecture in Depth - Intermediate Dimensional Modeling
- Arguments for dimensional modeling:
- Tony Dahlager's and John Barcheski's Coalesce 2022 presentation "Back to the Future":
- "Dimensional modeling aims to model the business rather than modeling relationships among data elements"
- "Dimensional modeling is a way to conceptualize data requests in business terms"
- "A dimensional model can serve as 'building blocks' for consumption."
- Tony Dahlager's and John Barcheski's Coalesce 2022 presentation "Back to the Future":
- Ralph Kimball's Dimensional Modeling Manifesto
- Narrative modeling
- Anchor modeling
- Presentation
- Paper
- Personal impressions: Looks very impressive, but is quite confusing. I don't fully understand it, but it appears that it'd be very difficult to train data analysts in using this.
- Activity schema
- Spec
- Personal impressions: I really wanted to like Activity Schema, and spent a lot of time trying to figure it out. I initially thought the columns in the schema were examples, and only after a while realized "no, the author literally wants me to name my columns 'customer', 'feature_1', 'feature_2', and 'feature_3'. It's completely inflexible and confusing. My takeaway is that Activity Schema is not meant to be written or read by humans, but only passed through automated tools like Narrator.
- Data mesh
- Personal impressions: Nothing actionable here unless you're the CIO/CDO. Data architects are lucky if they can influence the technology stack, much less how teams are organized, structured, and funded. Also, the idea of forcing teams to produce/consume data exclusively through contracts sounds like it'd have a ton of overhead and further exacerbate the very deivides that data warehousing was trying to solve.