michalskop/analysis.md

## analysis.md

      
    Raw
  

              analysis.md
            
          
    Analytics in Volebna kalkulacka 2016

https://volebnakalkulacka.sk
AB testing

Clicking on detailed comparison

Example Page: https://volebnakalkulacka.sk/match/?q1=1
Explanation: A user may click on name of a party and get the detailed comparison between their answers and the party's answers. Our aim is to engage poeple to see the detailed comparisons.
Sample: ~10300 users
Data: https://docs.google.com/spreadsheets/d/1DuPvWxE7YEktVbkCIop5sG3QYg4ReMQr68QFKgZ9dbg/edit?usp=sharing
Script: clicked.py
The wording above the table with parties was:

A: "Results" ("Výsledky") (results_a.jpg)
B: "Results: you can see the detailed comparison by clicking on party's name" ("Výsledky: kliknutím na meno strany zobrazíte detailné porovnanie") (results_b.jpg)

CTR

Table with parties

Clicking at least once:

A: 24%
B: 32%

Clicking twice or more times:

A: 12%
B: 17%

Statistically significant improvements
"Winners" table

Top 1 or 3 (depending on device) parties

A: 7.9%
B: 8.5%

Statistically not significant
Facebook sharing


A: 1.6%
B: 1.8%

Statistically not significant
Facebook sharing using unique address

Example Page: https://volebnakalkulacka.sk/match/?q1=1
Explanation: When user is sharing the results (clicking a link opening Facebook Feed dialogue), the link parameter may be the same for all users (parameter "abfb") or unique (parameter session_id)
Sample: 349 users, using probability of 2/3 of showing "abfb", censoring treated using data from (Clicking on detailed comparison)
Data: https://docs.google.com/spreadsheets/d/1y4_aDcPmrJaif2bnPmlcZIzuldLIpR3aCVIClR9XxfU/edit?usp=sharing
Script: fb3.py
Parameters of the link:

A: common abfb parameter
B: unique session_id parameter

Results:

Users originating in the shared results of a previous user, adjusted for censoring (3/4*number for B) and adjusted for the same probability

A: 82 (82 raw)
B: 125 (83 only adjusted fo censoring)

The difference is roughly 1.5 times more users attracted by the for B variant (unique parameters)
Statistically significant difference
Geocoding

Explanation: The HTML5 geocoding was added to the result page with no explanation for the users (and no obvious reason for them)
Sample: 4588 users, 1/2 probability; 1609 users answered the browser question whether to share the location (some probably have ignored the question)
Scripts: fb4.py, clicked3.py

A: No geocoding (2296 users with results)
B: geocoding (2266)

B1: allowed geocoding (159)
B2: refused geocoding (1450)
B3: no data (probably ignored geocoding) (657)


Results:

No significant results allowing the use of geocoding without lost of users' attention or number of users
"Winners" table

Clicking at least once:

A: 9.1%
B: 9.5%

Table with parties

Clicking at least once:

A: 34.5%
B: 34.9%

Facebook sharing


A: 1.6%
B: 1.5%

Originating Facebook reaction

At least one referenced user within the day of experiment (censored observations!).


A: 0.74%


B: 0.75%


B1: 1.9%


B2+3: 0.67%


Interestingly, the number of users coming from shared links is statistically significantly different:

A: 4.3% (of previous users)
B: 7.9% (of previous users)

Facebook reactions

Distribution of time

The distribution of times from a source user (who shared on Facebook the results through the app, using their last time when they displayed the results) to their responding users (who used the link on Facebook, using time of their first results).
Sample: 15490 doubles (source->responding)
Data: https://docs.google.com/spreadsheets/d/1R433MT8OPPNw9_Y68ZinQXMoN-6GQ3kGDYlY1xsRJtk/
Script: fb2.py
Results

Percentiles:

10%: 26 minutes
25%: 78 minutes
50%: 310 minutes (5:10 hours)
75%: 1000 minutes (16:40 hours)
90%: 1840 minutes (30:40 hours)

(distribution_fb_reaction.png)
Research

A questionnaire was shown to randomly selected users directly accessing the front page (not through the media partners). Not all questions were asked all the time. For the questionnaire see https://volebnakalkulacka.sk/volby-2016/research/ .
Breakdowns

Skipped


skipped: 51%
not skipped: 49% (using for following breakdowns)

Going to vote?

filled by 118% (?)

yes (for sure): 79%
rather yes: 12%
rather no: 2%
no (for sure): 1%
do not know: 2%
no right: 3%

Preferred party

filled by 96%
Party: percentage by users (percentage in elections / percentage in Polis exit poll):

smer-sd: 5.9% (28.2% / 27.3%)
sas: 27.9% (12.1% / 13.3%)
olano-nova: 14.2% (11.0% / 11.2%)
sns: 4.7% (8.6% / 8.0%)
ls-ns: 12.0% (8.0% / 6.8%)
sme-rodina: 7.3% (6.6% / 5.9%)
most-hid: 7.2% (6.5% / 7.3%)
siet: 8.8% (5.6% / 6.7%)
kdh: 3.3% (4.9% / 5.0%)
smk-mkp: 0.7% (4.0% / 3.6%)
sok: 1.6% (0.8% / 1.0%)
szs: 1.5% (0.7% / 0.6%)
tip: 0.4% (0.7% / 0.5%)
kss: 0.6% (0.6% / 0.6%)
sanca: 0.6% (0.3% / 0.4%)
sdku-ds: 0.4% (0.3% / 0.8%)
sms: 0.2% (0.2% / 0.2%)
ds: 0.7% (0.1% / 0.1%)
vzdor: 0.6% (0.1% / 0.1%)
pd: 0.4% (0.1% / 0.1%)
odvaha: 0.4% (0.1% / 0.2%)
mkda-mkdsz: 0.2% (0.1% / 0.2%)
spolocne: 0.2% (0.1% / 0.1%)
some more details from the exit poll: https://dennikn.sk/395557/volil-kotlebu-preco-nie-nie-su-utecencitito-ludia-volili-kotlebu-tychto-dovodov/

Would never vote for

filled by 85%

smer-sd: 63.4%
ls-ns: 10.4%
kdh: 3.4%
olano-nova: 3.3%
mkda-mkdsz: 3.1%
kss: 3.0%
most-hid: 2.4%
smk-mkp: 2.1%
siet: 1.7%
ds: 1.1%
sme-rodina: 1.1%
sns: 1.0%
vzdor: 1.0%
sas: 0.9%
sdku-ds: 0.8%
odvaha: 0.5%
sanca: 0.3%
tip: 0.2%
sok: 0.1%
szs: 0.1%
pd: 0.1%
sms: 0.0%

Used election calculator previously

filled by 94%

no: 87%
yes: 13%

Did it influence you (previously)?


no: 70%
yes: 30%

some values:

not at all (-100): 33%
quite a lot (>=50): 10%
totally (+100): 4%

Will the calculator influence elections?

filled by 25% (cannot distinguish 0 and not filled)
Out of filled (out of all answered)

will influence: 42% (11%)
will not influence: 58% (15%)

How often looking for information about politics

filled by 29% (cannot distinguish 0 and not filled)
Out of filled (out of all answered)

rather yes (>0): 69% (20%)
rather not (<0): 31% (9%)

Gender

filled by 91%

women: 41%
men: 59%

Occupation

filled by 91%

employee: 49%
student: 28%
enterpreneur: 14%
not employed: 3%
other: 7%

Education

filled by 91%

primary: 6%
secondary: 42%
tertiary: 52%

Age

filled by 91%

<18: 3%
18-29: 47%
30-39: 24%
40-49: 12%
50-59: 9%
60-69: 5%


=70: 1%


Region

filled by 82%
Users (comparing to average / voters)

Bratislava: 24% (207%)
Košice: 10% (94%)
Žilina: 9% (82%)
Prešov: 9% (83%)
Trenčín: 8% (85%)
Bánska Bystrica: 8% (86%)
Nitra: 7% (66%)
Trnava: 7% (79%)

Municipality size

filled by 89%

<3000: 20%
3000-10000: 12%
10000-50000: 23%
50000-100000: 14%


100000: 30%


Email

filled by 14%
Geolocation


no data (ignored): 43%
permission denied: 46%
ok: 10%
position unavailable: 0.6%
not supported: 0.3%


## clicked.py
# create table based on A-B testing -> clicking on comparison

import csv
import json
import re

results = {}
with open("result.txt") as fin:
    csvr = csv.reader(fin, delimiter="\t")
    for row in csvr:
        if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
            try:
                results[row[0]]
            except:
                results[row[0]] = {
                    "date": row[2],
                    "session_id": row[0],
                    "data": json.loads(row[3])
                }
print("result read")

clicks = {}
with open("click.txt") as fin:
    csvr = csv.reader(fin, delimiter="\t")
    for row in csvr:
        if row[2] > "2016-02-24 04:00:00":
            try:
                clicks[row[0]]
            except:
                clicks[row[0]] = []
            match = re.search(r"match",row[1])
            if match:
                page = "match"
            else:
                page = "page"
            item_li = row[3].split("-")
            clicks[row[0]].append({
                "session_id": row[0],
                "page": page,
                "date": row[2],
                "item": row[3],
                "type": item_li[0]
            })

print("click read")
out = {}
for k in results:
    try:
        clicks[k]
        try:
            out[k]
        except:
            out[k] = {
                "winner": 0,
                "table": 0,
                "fb": 0
            }
            try:
                out[k]["ab"] = results[k]["data"]['ab-results']
            except:
                out[k]["ab"] = "none"
        for r in clicks[k]:
            if r['type'] == 'winner':
                out[k]['winner'] += 1
            if r['type'] == 'table':
                out[k]['table'] += 1
            if r['type'] == 'fb':
                out[k]['fb'] += 1
    except:
        nothing = 0

with open("clicked_comparison_2.csv",'w') as fout:
    csvw = csv.writer(fout)
    for key in out:
        x = out[key]
        csvw.writerow([x['ab'],x['winner'],x['table'],x['fb']])

## distribution_fb_reaction.png

      
    Raw
  

              distribution_fb_reaction.png
            
          
## fb2.py
# Get Fb doubles (source->responing)

import csv
import json


# fb clicks

#fbs = {}
#with open("click.txt") as fin:
#    csvr = csv.reader(fin, delimiter="\t")
#    for row in csvr:
#        match = re.match(r"fb-",row[3])
#        if match:
#            try:
#                fbs[row[0]]
#            except:
#                fbs[row[0]] = []
#            fbs[row[0]].append(row[2])

#sources times
sources = {}
with open("result.txt") as fin:
    csvr = csv.reader(fin, delimiter="\t")
    for row in csvr:
        if row[1] == "volby-2016":
            try:
                sources[row[0]]
            except:
                sources[row[0]] = []
            sources[row[0]].append(row[2])

doubles = []
with open("result.txt") as fin:
    csvr = csv.reader(fin, delimiter="\t")
    for row in csvr:
        if row[1] == "volby-2016":# and row[2] >= "2016-02-25 15:08:26":
            data = json.loads(row[3])
            try:
                if not data['ref'] == "":
                    doubles.append({
                        'from': data['ref'],
                        'to': row[0],
                        "date": row[2],
                        "from-date0": sources[data['ref']][-1],
                        "from-date1": sources[data['ref']][0]
                    })
            except:
                nothing = 0


#doubles = []
#with open("result.txt") as fin:
#    csvr = csv.reader(fin, delimiter="\t")
#    for row in csvr:
#        if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
#            try:
#                data = json.loads(row[3])
#                try:
#                    data['ab-fb']
#                    if not data['ref'] == "":
#                        doubles.append({
#                            'from': data['ref'],
#                            'to': row[0],
#                            "date": row[2],
#                            "ab-fb": data['ab-fb']
#                        })
#                except:
#                    nothing = 0
#            except Exception as e:
##                print(e)
#                nothing = 0
print("result read")

#with open("research.txt") as fin:
#    csvr = csv.reader(fin, delimiter="\t")
#    for row in csvr:
#        try:
#            data = json.loads(row[3])
#            if not data['ref'] == "":
#                doubles.append({
#                    'from': data['ref'],
#                    'to': row[0],
#                    "date": row[2]
#                })
#        except Exception as e:
##            print(e)
#            nothing = 0
#print("research read")

with open("doubles2.csv",'w') as fout:
    csvw = csv.writer(fout)
    for x in doubles:
        csvw.writerow([x['from'],x['to'],x['date'],x['from-date0'],x['from-date1']])

#raise(Exception)

## fb3.py
# doubles for AB comparison

import csv
import json
import re

# doubles
doubles = []
with open("result.txt") as fin:
    csvr = csv.reader(fin, delimiter="\t")
    for row in csvr:
        if row[1] == "volby-2016" and row[2] >= "2016-02-25 16:00:00":
            try:
                data = json.loads(row[3])
                if not data['ref'] == "":
                    doubles.append({
                        'from': data['ref'],
                        'to': row[0],
                        "date": row[2]
                    })
            except Exception as e:
#                print(e)
                nothing = 0
print("result read")

with open("doubles3.csv",'w') as fout:
    csvw = csv.writer(fout)
    for x in doubles:
        csvw.writerow([x['from'],x['to'],x['date']])

raise(Exception)

## results-a.jpg

      
    Raw
  

              results-a.jpg
            
          
## results-b.jpg

      
    Raw
  

              results-b.jpg
	# create table based on A-B testing -> clicking on comparison

	import csv
	import json
	import re

	results = {}
	with open("result.txt") as fin:
	csvr = csv.reader(fin, delimiter="\t")
	for row in csvr:
	if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
	try:
	results[row[0]]
	except:
	results[row[0]] = {
	"date": row[2],
	"session_id": row[0],
	"data": json.loads(row[3])
	}
	print("result read")

	clicks = {}
	with open("click.txt") as fin:
	csvr = csv.reader(fin, delimiter="\t")
	for row in csvr:
	if row[2] > "2016-02-24 04:00:00":
	try:
	clicks[row[0]]
	except:
	clicks[row[0]] = []
	match = re.search(r"match",row[1])
	if match:
	page = "match"
	else:
	page = "page"
	item_li = row[3].split("-")
	clicks[row[0]].append({
	"session_id": row[0],
	"page": page,
	"date": row[2],
	"item": row[3],
	"type": item_li[0]
	})

	print("click read")
	out = {}
	for k in results:
	try:
	clicks[k]
	try:
	out[k]
	except:
	out[k] = {
	"winner": 0,
	"table": 0,
	"fb": 0
	}
	try:
	out[k]["ab"] = results[k]["data"]['ab-results']
	except:
	out[k]["ab"] = "none"
	for r in clicks[k]:
	if r['type'] == 'winner':
	out[k]['winner'] += 1
	if r['type'] == 'table':
	out[k]['table'] += 1
	if r['type'] == 'fb':
	out[k]['fb'] += 1
	except:
	nothing = 0

	with open("clicked_comparison_2.csv",'w') as fout:
	csvw = csv.writer(fout)
	for key in out:
	x = out[key]
	csvw.writerow([x['ab'],x['winner'],x['table'],x['fb']])
	# Get Fb doubles (source->responing)

	import csv
	import json


	# fb clicks

	#fbs = {}
	#with open("click.txt") as fin:
	# csvr = csv.reader(fin, delimiter="\t")
	# for row in csvr:
	# match = re.match(r"fb-",row[3])
	# if match:
	# try:
	# fbs[row[0]]
	# except:
	# fbs[row[0]] = []
	# fbs[row[0]].append(row[2])

	#sources times
	sources = {}
	with open("result.txt") as fin:
	csvr = csv.reader(fin, delimiter="\t")
	for row in csvr:
	if row[1] == "volby-2016":
	try:
	sources[row[0]]
	except:
	sources[row[0]] = []
	sources[row[0]].append(row[2])

	doubles = []
	with open("result.txt") as fin:
	csvr = csv.reader(fin, delimiter="\t")
	for row in csvr:
	if row[1] == "volby-2016":# and row[2] >= "2016-02-25 15:08:26":
	data = json.loads(row[3])
	try:
	if not data['ref'] == "":
	doubles.append({
	'from': data['ref'],
	'to': row[0],
	"date": row[2],
	"from-date0": sources[data['ref']][-1],
	"from-date1": sources[data['ref']][0]
	})
	except:
	nothing = 0




	#doubles = []
	#with open("result.txt") as fin:
	# csvr = csv.reader(fin, delimiter="\t")
	# for row in csvr:
	# if row[1] == "volby-2016" and row[2] > "2016-02-24 04:00:00":
	# try:
	# data = json.loads(row[3])
	# try:
	# data['ab-fb']
	# if not data['ref'] == "":
	# doubles.append({
	# 'from': data['ref'],
	# 'to': row[0],
	# "date": row[2],
	# "ab-fb": data['ab-fb']
	# })
	# except:
	# nothing = 0
	# except Exception as e:
	## print(e)
	# nothing = 0
	print("result read")

	#with open("research.txt") as fin:
	# csvr = csv.reader(fin, delimiter="\t")
	# for row in csvr:
	# try:
	# data = json.loads(row[3])
	# if not data['ref'] == "":
	# doubles.append({
	# 'from': data['ref'],
	# 'to': row[0],
	# "date": row[2]
	# })
	# except Exception as e:
	## print(e)
	# nothing = 0
	#print("research read")

	with open("doubles2.csv",'w') as fout:
	csvw = csv.writer(fout)
	for x in doubles:
	csvw.writerow([x['from'],x['to'],x['date'],x['from-date0'],x['from-date1']])

	#raise(Exception)
	# doubles for AB comparison

	import csv
	import json
	import re

	# doubles
	doubles = []
	with open("result.txt") as fin:
	csvr = csv.reader(fin, delimiter="\t")
	for row in csvr:
	if row[1] == "volby-2016" and row[2] >= "2016-02-25 16:00:00":
	try:
	data = json.loads(row[3])
	if not data['ref'] == "":
	doubles.append({
	'from': data['ref'],
	'to': row[0],
	"date": row[2]
	})
	except Exception as e:
	# print(e)
	nothing = 0
	print("result read")

	with open("doubles3.csv",'w') as fout:
	csvw = csv.writer(fout)
	for x in doubles:
	csvw.writerow([x['from'],x['to'],x['date']])

	raise(Exception)