Skip to content

Instantly share code, notes, and snippets.

@MarkDana
Created November 3, 2022 03:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MarkDana/9037f3054f8582726c0e0ed23b215e94 to your computer and use it in GitHub Desktop.
Save MarkDana/9037f3054f8582726c0e0ed23b215e94 to your computer and use it in GitHub Desktop.
Get the result of MBTI personalities from questionnaire answers (by web scrapping or fitted linear sum)
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import json, random, requests, math
from bs4 import BeautifulSoup
import numpy as np
header={
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
"Content-Type": "application/json; charset=utf-8",
}
questions_list = [
"You regularly make new friends.",
"You spend a lot of your free time exploring various random topics that pique your interest.",
"Seeing other people cry can easily make you feel like you want to cry too.",
"You often make a backup plan for a backup plan.",
"You usually stay calm, even under a lot of pressure.",
"At social events, you rarely try to introduce yourself to new people and mostly talk to the ones you already know.",
"You prefer to completely finish one project before starting another.",
"You are very sentimental.",
"You like to use organizing tools like schedules and lists.",
"Even a small mistake can cause you to doubt your overall abilities and knowledge.",
"You feel comfortable just walking up to someone you find interesting and striking up a conversation.",
"You are not too interested in discussing various interpretations and analyses of creative works.",
"You are more inclined to follow your head than your heart.",
"You usually prefer just doing what you feel like at any given moment instead of planning a particular daily routine.",
"You rarely worry about whether you make a good impression on people you meet.",
"You enjoy participating in group activities.",
"You like books and movies that make you come up with your own interpretation of the ending.",
"Your happiness comes more from helping others accomplish things than your own accomplishments.",
"You are interested in so many things that you find it difficult to choose what to try next.",
"You are prone to worrying that things will take a turn for the worse.",
"You avoid leadership roles in group settings.",
"You are definitely not an artistic type of person.",
"You think the world would be a better place if people relied more on rationality and less on their feelings.",
"You prefer to do your chores before allowing yourself to relax.",
"You enjoy watching people argue.",
"You tend to avoid drawing attention to yourself.",
"Your mood can change very quickly.",
"You lose patience with people who are not as efficient as you.",
"You often end up doing things at the last possible moment.",
"You have always been fascinated by the question of what, if anything, happens after death.",
"You usually prefer to be around others rather than on your own.",
"You become bored or lose interest when the discussion gets highly theoretical.",
"You find it easy to empathize with a person whose experiences are very different from yours.",
"You usually postpone finalizing decisions for as long as possible.",
"You rarely second-guess the choices that you have made.",
"After a long and exhausting week, a lively social event is just what you need.",
"You enjoy going to art museums.",
"You often have a hard time understanding other people’s feelings.",
"You like to have a to-do list for each day.",
"You rarely feel insecure.",
"You avoid making phone calls.",
"You often spend a lot of time trying to understand views that are very different from your own.",
"In your social circle, you are often the one who contacts your friends and initiates activities.",
"If your plans are interrupted, your top priority is to get back on track as soon as possible.",
"You are still bothered by mistakes that you made a long time ago.",
"You rarely contemplate the reasons for human existence or the meaning of life.",
"Your emotions control you more than you control them.",
"You take great care not to make people look bad, even when it is completely their fault.",
"Your personal work style is closer to spontaneous bursts of energy than organized and consistent efforts.",
"When someone thinks highly of you, you wonder how long it will take them to feel disappointed in you.",
"You would love a job that requires you to work alone most of the time.",
"You believe that pondering abstract philosophical questions is a waste of time.",
"You feel more drawn to places with busy, bustling atmospheres than quiet, intimate places.",
"You know at first glance how someone is feeling.",
"You often feel overwhelmed.",
"You complete things methodically without skipping over any steps.",
"You are very intrigued by things labeled as controversial.",
"You would pass along a good opportunity if you thought someone else needed it more.",
"You struggle with deadlines.",
"You feel confident that things will work out for you."
]
code_table = {
'Extraverted': 'E',
'Introverted': 'I',
'Intuitive': 'N',
'Observant': 'S',
'Thinking': 'T',
'Feeling': 'F',
'Judging': 'J',
'Prospecting': 'P',
'Assertive': 'A',
'Turbulent': 'T',
}
def get_scores_from_web_scrap(answers_list):
'''
Args:
answers_list: a list of 60 integers, each int is in range -3 to 3 (so 7 choices)
##### !!!! [BE AWARE] HERE -3 MEANS "STRONG AGREE" AND 3 MEANS "STRONG DISAGREE" !!!! #####
Returns:
code: a string of 4chars-1char, e.g., ISTP-A, each char is one of (E,I), (N,S), (T,F), (J,P), and (A,T)
scores: a list of 5 ints, e.g., [-30, -12, 13, -25, 25], each int is the score;
the last one shows overall identity score (confident we are in our abilities and decisions)
positive means the first item (e.g., Extraverted), and negative means the second item (e.g., Introverted)
percentages: a dict with 10 keys (see `code_table` for their names), and values in [0, 100] ints
and visit https://www.16personalities.com/ for more detailed explanations.
note that percentages is calculated from scores by e.g., math.floor(50 - score/2)
'''
assert len(answers_list) == len(questions_list), "The number of answers should be the same as the number of questions."
assert all([isinstance(item, int) and item >= -3 and item <= 3 for item in answers_list]), "The answers should be integers between -3 and 3."
payload = {
"questions": [{"text": q, "answer": a} for q, a in zip(questions_list, answers_list)],
"gender": "", "inviteCode": "", "teamInviteKey": "", "extraData": []
}
encoded_data = json.dumps(payload).encode('utf-8')
s = requests.session()
s.get("https://www.16personalities.com/free-personality-test")
s.post(url="https://www.16personalities.com/test-results", headers=header, data=encoded_data)
r = s.get("https://www.16personalities.com/profile")
soup = BeautifulSoup(r.text, "lxml")
s.close()
details = soup.find_all("app-pct-bar")
reversed_flags = [_.get(":reversed") == 'true' for _ in details]
scores = [int(_.get(":score")) for _ in details]
titles = [eval(_.get(":titles")) for _ in details]
dimension_id = [1 if rev else 0 for rev in reversed_flags]
char_codes = [code_table[title[id]] for title, id in zip(titles, dimension_id)]
code = "".join(char_codes[:4]) + '-' + char_codes[4]
double_check_sec = soup.find("auth-registration")
assert double_check_sec.get("personality") == code[:4]
assert all([a == b for a, b in zip(scores, eval(double_check_sec.get(":scores")))])
scores = [scr if not rev else -scr for scr, rev in zip(scores, reversed_flags)]
percentages = {}
for title, score in zip(titles, scores):
t0_score_raw = (100 + score) / 2
t0_score = math.ceil(t0_score_raw) if score > 0 else math.floor(t0_score_raw)
t1_score = 100 - t0_score
percentages[title[0]], percentages[title[1]] = t0_score, t1_score
# this percentage is according to javascripts in the website
return code, scores, percentages
def get_scores_from_linear_sum(answers):
'''
After analyzing results scrapped from website, we found that the 5 dimension scores are obtained from very naive
linear sum of answers. Each question/answer only indicates one category among five: E/I, N/S, T/F, J/P, A/T.
So each score is calculated from its respective 60/5=12 questions (taken as independent of other questions).
Among those 12 questions, some of them contribute 'positively' (i.e., for the first dimension),
e.g., "You regularly make new friends." for Extraverted;
and some of them contribute 'negatively' (i.e., for the second dimension),
e.g., "At social events, you rarely try to introduce yourself to new people and mostly talk to the ones you already know." for Introverted.
The linear weights abs are all the same (i.e., all questions are equally important). we estimate it as 2.75.
Note that since -3 means "Strongly Agree", positive coefficients means negative contributions.
Except for the linear coefficients, there are also constant intercepts (offsets).
They may indicate a priori of people's personality (e.g., -5 on E/I means population are more likely to be introverted).
The coefficients and intercepts are obtained from linear regression on the results scrapped from website.
The calculated scores here may not be exactly the same as the ones from website, but they are very close.
Close means e.g., the difference among two scores on each dimension is in ±1 (viewed as rounding errors).
Args: answers: the same as `get_scores_from_web_scrap`
'''
questions_catogiries = [0, 1, 2, 3, 4, 0, 3, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 1, 4, 0, 1,
2, 3, 2, 0, 4, 2, 3, 1, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 0, 3,
4, 1, 4, 2, 3, 4, 0, 1, 0, 2, 4, 3, 1, 2, 3, 4]
INTERCEPTS = [-5, -15, 5, 0, 0]
CONST_LINEAR_WEIGHT = -2.75
NEGATIVE_QUESTIONS = [ # i.e., with positive coefficients
[1, 4, 5, 8, 10],
[1, 4, 6, 9, 10],
[0, 1, 3, 7, 9, 10, 11],
[3, 5, 6, 9, 11],
[1, 3, 4, 7, 8, 9, 10]
]
scores = []
answers = np.array(answers)
for category_id in range(5):
c_indexs = np.where(np.array(questions_catogiries) == category_id)[0]
c_answers = answers[c_indexs]
c_coefs = np.ones(len(c_indexs)) * CONST_LINEAR_WEIGHT
c_coefs[NEGATIVE_QUESTIONS[category_id]] *= -1
raw_score = c_answers @ c_coefs + INTERCEPTS[category_id]
score = np.floor(raw_score) if raw_score > 0 else np.ceil(raw_score)
score = np.clip(score, -100, 100)
if score == 0: score = -1
scores.append(int(score))
return scores # you may then calculate percentages the same as `get_scores_from_web_scrap`
if __name__ == '__main__':
answers = [random.randint(-3, 3) for _ in range(len(questions_list))]
code, scores, percentages = get_scores_from_web_scrap(answers)
print('===== from web scraping =====')
print('code:', code)
print('scores:', scores)
print('percentages:', percentages)
scores = get_scores_from_linear_sum(answers)
print('===== from fitted linear sum =====')
print('scores:', scores)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment