Skip to content

Instantly share code, notes, and snippets.

@michalskop
Last active April 21, 2016 18:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save michalskop/b5a51290bd98c7f3063b to your computer and use it in GitHub Desktop.
Save michalskop/b5a51290bd98c7f3063b to your computer and use it in GitHub Desktop.
Analytics in Volebná kalkulačka 2016

Analytics in Volebna kalkulacka 2016

https://volebnakalkulacka.sk

AB testing

Clicking on detailed comparison

Example Page: https://volebnakalkulacka.sk/match/?q1=1

Explanation: A user may click on name of a party and get the detailed comparison between their answers and the party's answers. Our aim is to engage poeple to see the detailed comparisons.

Sample: ~10300 users

Data: https://docs.google.com/spreadsheets/d/1DuPvWxE7YEktVbkCIop5sG3QYg4ReMQr68QFKgZ9dbg/edit?usp=sharing

Script: clicked.py

The wording above the table with parties was:

  • A: "Results" ("Výsledky") (results_a.jpg)
  • B: "Results: you can see the detailed comparison by clicking on party's name" ("Výsledky: kliknutím na meno strany zobrazíte detailné porovnanie") (results_b.jpg)

CTR

Table with parties

Clicking at least once:

  • A: 24%
  • B: 32%

Clicking twice or more times:

  • A: 12%
  • B: 17%

Statistically significant improvements

"Winners" table

Top 1 or 3 (depending on device) parties

  • A: 7.9%
  • B: 8.5%

Statistically not significant

Facebook sharing
  • A: 1.6%
  • B: 1.8%

Statistically not significant

Facebook sharing using unique address

Example Page: https://volebnakalkulacka.sk/match/?q1=1

Explanation: When user is sharing the results (clicking a link opening Facebook Feed dialogue), the link parameter may be the same for all users (parameter "abfb") or unique (parameter session_id)

Sample: 349 users, using probability of 2/3 of showing "abfb", censoring treated using data from (Clicking on detailed comparison)

Data: https://docs.google.com/spreadsheets/d/1y4_aDcPmrJaif2bnPmlcZIzuldLIpR3aCVIClR9XxfU/edit?usp=sharing

Script: fb3.py

Parameters of the link:

  • A: common abfb parameter
  • B: unique session_id parameter

Results:

Users originating in the shared results of a previous user, adjusted for censoring (3/4*number for B) and adjusted for the same probability

  • A: 82 (82 raw)
  • B: 125 (83 only adjusted fo censoring)

The difference is roughly 1.5 times more users attracted by the for B variant (unique parameters)

Statistically significant difference

Geocoding

Explanation: The HTML5 geocoding was added to the result page with no explanation for the users (and no obvious reason for them)

Sample: 4588 users, 1/2 probability; 1609 users answered the browser question whether to share the location (some probably have ignored the question)

Scripts: fb4.py, clicked3.py

  • A: No geocoding (2296 users with results)
  • B: geocoding (2266)
    • B1: allowed geocoding (159)
    • B2: refused geocoding (1450)
    • B3: no data (probably ignored geocoding) (657)

Results:

No significant results allowing the use of geocoding without lost of users' attention or number of users

"Winners" table

Clicking at least once:

  • A: 9.1%
  • B: 9.5%
Table with parties

Clicking at least once:

  • A: 34.5%
  • B: 34.9%
Facebook sharing
  • A: 1.6%
  • B: 1.5%
Originating Facebook reaction

At least one referenced user within the day of experiment (censored observations!).

  • A: 0.74%

  • B: 0.75%

  • B1: 1.9%

  • B2+3: 0.67%

Interestingly, the number of users coming from shared links is statistically significantly different:

  • A: 4.3% (of previous users)
  • B: 7.9% (of previous users)

Facebook reactions

Distribution of time

The distribution of times from a source user (who shared on Facebook the results through the app, using their last time when they displayed the results) to their responding users (who used the link on Facebook, using time of their first results).

Sample: 15490 doubles (source->responding)

Data: https://docs.google.com/spreadsheets/d/1R433MT8OPPNw9_Y68ZinQXMoN-6GQ3kGDYlY1xsRJtk/

Script: fb2.py

Results

Percentiles:

  • 10%: 26 minutes
  • 25%: 78 minutes
  • 50%: 310 minutes (5:10 hours)
  • 75%: 1000 minutes (16:40 hours)
  • 90%: 1840 minutes (30:40 hours)

(distribution_fb_reaction.png)

Research

A questionnaire was shown to randomly selected users directly accessing the front page (not through the media partners). Not all questions were asked all the time. For the questionnaire see https://volebnakalkulacka.sk/volby-2016/research/ .

Breakdowns

Skipped

  • skipped: 51%
  • not skipped: 49% (using for following breakdowns)

Going to vote?

filled by 118% (?)

  • yes (for sure): 79%
  • rather yes: 12%
  • rather no: 2%
  • no (for sure): 1%
  • do not know: 2%
  • no right: 3%

Preferred party

filled by 96%

Party: percentage by users (percentage in elections / percentage in Polis exit poll):

  • smer-sd: 5.9% (28.2% / 27.3%)
  • sas: 27.9% (12.1% / 13.3%)
  • olano-nova: 14.2% (11.0% / 11.2%)
  • sns: 4.7% (8.6% / 8.0%)
  • ls-ns: 12.0% (8.0% / 6.8%)
  • sme-rodina: 7.3% (6.6% / 5.9%)
  • most-hid: 7.2% (6.5% / 7.3%)
  • siet: 8.8% (5.6% / 6.7%)
  • kdh: 3.3% (4.9% / 5.0%)
  • smk-mkp: 0.7% (4.0% / 3.6%)
  • sok: 1.6% (0.8% / 1.0%)
  • szs: 1.5% (0.7% / 0.6%)
  • tip: 0.4% (0.7% / 0.5%)
  • kss: 0.6% (0.6% / 0.6%)
  • sanca: 0.6% (0.3% / 0.4%)
  • sdku-ds: 0.4% (0.3% / 0.8%)
  • sms: 0.2% (0.2% / 0.2%)
  • ds: 0.7% (0.1% / 0.1%)
  • vzdor: 0.6% (0.1% / 0.1%)
  • pd: 0.4% (0.1% / 0.1%)
  • odvaha: 0.4% (0.1% / 0.2%)
  • mkda-mkdsz: 0.2% (0.1% / 0.2%)
  • spolocne: 0.2% (0.1% / 0.1%) some more details from the exit poll: https://dennikn.sk/395557/volil-kotlebu-preco-nie-nie-su-utecencitito-ludia-volili-kotlebu-tychto-dovodov/

Would never vote for

filled by 85%

  • smer-sd: 63.4%
  • ls-ns: 10.4%
  • kdh: 3.4%
  • olano-nova: 3.3%
  • mkda-mkdsz: 3.1%
  • kss: 3.0%
  • most-hid: 2.4%
  • smk-mkp: 2.1%
  • siet: 1.7%
  • ds: 1.1%
  • sme-rodina: 1.1%
  • sns: 1.0%
  • vzdor: 1.0%
  • sas: 0.9%
  • sdku-ds: 0.8%
  • odvaha: 0.5%
  • sanca: 0.3%
  • tip: 0.2%
  • sok: 0.1%
  • szs: 0.1%
  • pd: 0.1%
  • sms: 0.0%

Used election calculator previously

filled by 94%

  • no: 87%
  • yes: 13%

Did it influence you (previously)?

  • no: 70%
  • yes: 30%

some values:

  • not at all (-100): 33%
  • quite a lot (>=50): 10%
  • totally (+100): 4%

Will the calculator influence elections?

filled by 25% (cannot distinguish 0 and not filled)

Out of filled (out of all answered)

  • will influence: 42% (11%)
  • will not influence: 58% (15%)

How often looking for information about politics

filled by 29% (cannot distinguish 0 and not filled)

Out of filled (out of all answered)

  • rather yes (>0): 69% (20%)
  • rather not (<0): 31% (9%)

Gender

filled by 91%

  • women: 41%
  • men: 59%

Occupation

filled by 91%

  • employee: 49%
  • student: 28%
  • enterpreneur: 14%
  • not employed: 3%
  • other: 7%

Education

filled by 91%

  • primary: 6%
  • secondary: 42%
  • tertiary: 52%

Age

filled by 91%

  • <18: 3%
  • 18-29: 47%
  • 30-39: 24%
  • 40-49: 12%
  • 50-59: 9%
  • 60-69: 5%
  • =70: 1%

Region

filled by 82%

Users (comparing to average / voters)

  • Bratislava: 24% (207%)
  • Košice: 10% (94%)
  • Žilina: 9% (82%)
  • Prešov: 9% (83%)
  • Trenčín: 8% (85%)
  • Bánska Bystrica: 8% (86%)
  • Nitra: 7% (66%)
  • Trnava: 7% (79%)

Municipality size

filled by 89%

  • <3000: 20%
  • 3000-10000: 12%
  • 10000-50000: 23%
  • 50000-100000: 14%
  • 100000: 30%

Email

filled by 14%

Geolocation

  • no data (ignored): 43%
  • permission denied: 46%
  • ok: 10%
  • position unavailable: 0.6%
  • not supported: 0.3%
# create table based on A-B testing -> clicking on comparison
import csv
import json
import re
results = {}
with open("result.txt") as fin:
csvr = csv.reader(fin, delimiter="\t")
for row in csvr:
if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
try:
results[row[0]]
except:
results[row[0]] = {
"date": row[2],
"session_id": row[0],
"data": json.loads(row[3])
}
print("result read")
clicks = {}
with open("click.txt") as fin:
csvr = csv.reader(fin, delimiter="\t")
for row in csvr:
if row[2] > "2016-02-24 04:00:00":
try:
clicks[row[0]]
except:
clicks[row[0]] = []
match = re.search(r"match",row[1])
if match:
page = "match"
else:
page = "page"
item_li = row[3].split("-")
clicks[row[0]].append({
"session_id": row[0],
"page": page,
"date": row[2],
"item": row[3],
"type": item_li[0]
})
print("click read")
out = {}
for k in results:
try:
clicks[k]
try:
out[k]
except:
out[k] = {
"winner": 0,
"table": 0,
"fb": 0
}
try:
out[k]["ab"] = results[k]["data"]['ab-results']
except:
out[k]["ab"] = "none"
for r in clicks[k]:
if r['type'] == 'winner':
out[k]['winner'] += 1
if r['type'] == 'table':
out[k]['table'] += 1
if r['type'] == 'fb':
out[k]['fb'] += 1
except:
nothing = 0
with open("clicked_comparison_2.csv",'w') as fout:
csvw = csv.writer(fout)
for key in out:
x = out[key]
csvw.writerow([x['ab'],x['winner'],x['table'],x['fb']])
# Get Fb doubles (source->responing)
import csv
import json
# fb clicks
#fbs = {}
#with open("click.txt") as fin:
# csvr = csv.reader(fin, delimiter="\t")
# for row in csvr:
# match = re.match(r"fb-",row[3])
# if match:
# try:
# fbs[row[0]]
# except:
# fbs[row[0]] = []
# fbs[row[0]].append(row[2])
#sources times
sources = {}
with open("result.txt") as fin:
csvr = csv.reader(fin, delimiter="\t")
for row in csvr:
if row[1] == "volby-2016":
try:
sources[row[0]]
except:
sources[row[0]] = []
sources[row[0]].append(row[2])
doubles = []
with open("result.txt") as fin:
csvr = csv.reader(fin, delimiter="\t")
for row in csvr:
if row[1] == "volby-2016":# and row[2] >= "2016-02-25 15:08:26":
data = json.loads(row[3])
try:
if not data['ref'] == "":
doubles.append({
'from': data['ref'],
'to': row[0],
"date": row[2],
"from-date0": sources[data['ref']][-1],
"from-date1": sources[data['ref']][0]
})
except:
nothing = 0
#doubles = []
#with open("result.txt") as fin:
# csvr = csv.reader(fin, delimiter="\t")
# for row in csvr:
# if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
# try:
# data = json.loads(row[3])
# try:
# data['ab-fb']
# if not data['ref'] == "":
# doubles.append({
# 'from': data['ref'],
# 'to': row[0],
# "date": row[2],
# "ab-fb": data['ab-fb']
# })
# except:
# nothing = 0
# except Exception as e:
## print(e)
# nothing = 0
print("result read")
#with open("research.txt") as fin:
# csvr = csv.reader(fin, delimiter="\t")
# for row in csvr:
# try:
# data = json.loads(row[3])
# if not data['ref'] == "":
# doubles.append({
# 'from': data['ref'],
# 'to': row[0],
# "date": row[2]
# })
# except Exception as e:
## print(e)
# nothing = 0
#print("research read")
with open("doubles2.csv",'w') as fout:
csvw = csv.writer(fout)
for x in doubles:
csvw.writerow([x['from'],x['to'],x['date'],x['from-date0'],x['from-date1']])
#raise(Exception)
# doubles for AB comparison
import csv
import json
import re
# doubles
doubles = []
with open("result.txt") as fin:
csvr = csv.reader(fin, delimiter="\t")
for row in csvr:
if row[1] == "volby-2016" and row[2] >= "2016-02-25 16:00:00":
try:
data = json.loads(row[3])
if not data['ref'] == "":
doubles.append({
'from': data['ref'],
'to': row[0],
"date": row[2]
})
except Exception as e:
# print(e)
nothing = 0
print("result read")
with open("doubles3.csv",'w') as fout:
csvw = csv.writer(fout)
for x in doubles:
csvw.writerow([x['from'],x['to'],x['date']])
raise(Exception)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment