panasenco/data-modeling.md

## data-modeling.md

      
    Raw
  

              data-modeling.md
            
          
    ROI Analysis


A/B testing

Ron Kohavi's Trustworthy Online Controlled Experiments:

Features are built because teams believe they are useful, yet in many domains
most ideas fail to improve key metrics. Only one third of the ideas tested at
Microsoft improved the metric(s) they were designed to improve (Kohavi,
Crook and Longbotham 2009). Success is even harder to find in well-optimized
domains like Bing and Google, whereby some measures’ success
rate is about 10–20% (Manzi 2012).
Fareed Mosavat, Slack’s Director of Product and Lifecycle tweeted that with
all of Slack’s experience, only about 30% of monetization experiments show
positive results; “if you are on an experiment-driven team, get used to, at best,
70% of your work being thrown away. Build your processes accordingly”
(Mosavat 2019).
Avinash Kaushik wrote in his Experimentation and Testing primer (Kaushik
2006) that “80% of the time you/we are wrong about what a customer wants.”
Mike Moran (Moran 2007, 240) wrote that Netflix considers 90% of what they
try to be wrong. Regis Hadiaris from Quicken Loans wrote that “in the five
years I’ve been running tests, I’m only about as correct in guessing the results
as a major league baseball player is in hitting the ball. That’s right – I’ve been
doing this for 5 years, and I can only ‘guess’ the outcome of a test about 33%
of the time!” (Moran 2008). Dan McKinley at Etsy (McKinley 2013) wrote
“nearly everything fails” and for features, he wrote “it’s been humbling to
realize how rare it is for them to succeed on the first attempt. I strongly suspect
that this experience is universal, but it is not universally recognized or
acknowledged.” Finally, Colin McFarland wrote in the book Experiment!
(McFarland 2012, 20) “No matter how much you think it’s a no-brainer,
how much research you’ve done, or how many competitors are doing it,
sometimes, more often than you might think, experiment ideas simply fail.”
Not every domain has such poor statistics, but most who have run controlled
experiments in customer-facing websites and applications have experienced
this humbling reality: we are poor at assessing the value of ideas.


Georgi Georgiev's What Can Be Learned From 1,001 A/B Tests?


Kimball Modeling


Book recommended at Coalesce 2022: Star Schema: The Complete Reference
Video series recommended in the comments: ETL Architecture in Depth - Intermediate Dimensional Modeling
Arguments for dimensional modeling:

Tony Dahlager's and John Barcheski's Coalesce 2022 presentation "Back to the Future":

"Dimensional modeling aims to model the business rather than modeling relationships among data elements"
"Dimensional modeling is a way to conceptualize data requests in business terms"
"A dimensional model can serve as 'building blocks' for consumption."


Ralph Kimball's Dimensional Modeling Manifesto

Non-Kimball Modeling


Narrative modeling
Anchor modeling

Presentation
Paper
Personal impressions:
Looks very impressive, but is quite confusing. I don't fully understand it, but it appears that
it'd be very difficult to train data analysts in using this.


Activity schema

Spec
Personal impressions:
I really wanted to like Activity Schema, and spent a lot of time trying to figure it out.
I initially thought the columns in the schema were examples, and only after a while realized
"no, the author literally wants me to name my columns 'customer', 'feature_1', 'feature_2',
and 'feature_3'. It's completely inflexible and confusing. My takeaway is that Activity
Schema is not meant to be written or read by humans, but only passed through automated tools
like Narrator.


Data mesh

Personal impressions:
Nothing actionable here unless you're the CIO/CDO. Data architects are lucky if they can influence
the technology stack, much less how teams are organized, structured, and funded. Also, the idea
of forcing teams to produce/consume data exclusively through contracts sounds like it'd have a
ton of overhead and further exacerbate the very deivides that data warehousing was trying to solve.