Skip to content

Instantly share code, notes, and snippets.

@mlopatka
Forked from saptarshiguha/foo.Rmd
Created August 6, 2018 07:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mlopatka/2de34029b9fa9baf69efa38b1b124f5b to your computer and use it in GitHub Desktop.
Save mlopatka/2de34029b9fa9baf69efa38b1b124f5b to your computer and use it in GitHub Desktop.
w=spark.sql("""
select
submission_date_s3,
client_id as cid,
sum(coalesce(scalar_parent_browser_engagement_total_uri_count,0)) as turi,
case when sum(coalesce(scalar_parent_browser_engagement_total_uri_count,0)) >=5 then 1 else 0 end as adau,
cast(sum(coalesce(scalar_parent_browser_engagement_total_uri_count,0))/(sum(active_ticks*5.0/3600)) as float) as turihr
from main_summary
where submission_date_s3>='20180701' and submission_date_s3<='20180707'
and sample_id ='42'
and app_name='Firefox'
group by 1,2
""")
w.createOrReplaceTempView("w")
spark.sql("""
with a as( select
submission_date_s3,
count(distinct(cid)) as np,
count(distinct(case when adau=1 then cid else null end)) as adau
from w
group by 1)
select submission_date_s3, adau/np as pact
from a
""").toPandas() ## stored in ll
```
```{r}
power.prop.test(p1=mean(ll$pact), p2=mean(ll$pact*1.01),sig=0.01,power=0.95)
```
Remove outliers for uris/hour
```{pydbx}
spark.sql("""
select
percentile_approx(turihr,0.999) as turimax
from w
""").toPandas()
spark.sql("""
select
mean(turihr) as m1,
stddev(turihr) as s1
from w
where turihr<6944.425293
""").toPandas()
pwr.t.test(d=176.750784*0.05/226.529248, power=0.95, sig=0.01)
(226.529248/176.750784)^2*1.96^2/0.01^2
spark.sql("""
select
percentile_approx(turi,0.999) as turimax
from w
""").toPandas()
spark.sql("""
select
mean(turi) as m1,
stddev(turi) as s1
from w
where turi<3256
""").toPandas()
(213.55/114.128)^2*1.96^2/0.01^2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment