s2t2/DATA.md

## DATA.md

      
    Raw
  

              DATA.md
            
          
column_name
datatype
description


user_id
INTEGER
unique identifier for each user in our "impeachment 2020" dataset


created_on
DATE
date the user was created


screen_name_count
INTEGER
number of screen names used


screen_names
STRING
all screen names used


is_bot
BOOLEAN
whether or not we classified this user as a "bot" / automated account


bot_rt_network
INTEGER
for bots, which retweet network (0:anti-trump, 1:pro-trump)


is_q
BOOLEAN
whether or not this user tweeted Q-anon language / hashtags


q_status_count
INTEGER
the number of tweets with Q-anon language / hashtags


status_count
INTEGER
number of total tweets authoried by this user (in our "impeachment 2020" dataset only)


rt_count
INTEGER
number of total retweets authoried by this user (in our "impeachment 2020" dataset only)


avg_score_lr
FLOAT
avergage opinion score from our Logistic Regression model (0:anti-trump, 1:pro-trump)


avg_score_nb
FLOAT
avergage opinion score from our Naive Bayes model (0:anti-trump, 1:pro-trump)


avg_score_bert
FLOAT
avergage opinion score from our BERT Transformer model (0:anti-trump, 1:pro-trump)


opinion_community
INTEGER
binary classification of average opinion (0:anti-trump, 1:pro-trump)


follower_count
INTEGER
number of followers (in our "impeachment 2020" dataset only)


follower_count_b
INTEGER
... who are bots


follower_count_h
INTEGER
... who are humans


friend_count
INTEGER
number of friends (in our "impeachment 2020" dataset only)


friend_count_b
INTEGER
... who are bots


friend_count_h
INTEGER
... who are humans


avg_toxicity
FLOAT
average "toxicity" score from the Detoxify model


avg_severe_toxicity
FLOAT
average "sever toxicity" score from the Detoxify model


avg_insult
FLOAT
average "insult" score from the Detoxify model


avg_obscene
FLOAT
average "obscene" score from the Detoxify model


avg_threat
FLOAT
average "threat" score from the Detoxify model


avg_identity_hate
FLOAT
average "identity hate" score from the Detoxify model


urls_shared_count (TODO)
INTEGER
number of tweets with URLs in them (TODO)


fact_scored_count
INTEGER
number of tweets with URL domains that we have rankings for


avg_fact_score
FLOAT
average fact score of links shared (1: fake news, 5: mainstream media)
column_name	datatype	description
user_id	INTEGER	unique identifier for each user in our "impeachment 2020" dataset
created_on	DATE	date the user was created
screen_name_count	INTEGER	number of screen names used
screen_names	STRING	all screen names used
is_bot	BOOLEAN	whether or not we classified this user as a "bot" / automated account
bot_rt_network	INTEGER	for bots, which retweet network (0:anti-trump, 1:pro-trump)
is_q	BOOLEAN	whether or not this user tweeted Q-anon language / hashtags
q_status_count	INTEGER	the number of tweets with Q-anon language / hashtags
status_count	INTEGER	number of total tweets authoried by this user (in our "impeachment 2020" dataset only)
rt_count	INTEGER	number of total retweets authoried by this user (in our "impeachment 2020" dataset only)
avg_score_lr	FLOAT	avergage opinion score from our Logistic Regression model (0:anti-trump, 1:pro-trump)
avg_score_nb	FLOAT	avergage opinion score from our Naive Bayes model (0:anti-trump, 1:pro-trump)
avg_score_bert	FLOAT	avergage opinion score from our BERT Transformer model (0:anti-trump, 1:pro-trump)
opinion_community	INTEGER	binary classification of average opinion (0:anti-trump, 1:pro-trump)
follower_count	INTEGER	number of followers (in our "impeachment 2020" dataset only)
follower_count_b	INTEGER	... who are bots
follower_count_h	INTEGER	... who are humans
friend_count	INTEGER	number of friends (in our "impeachment 2020" dataset only)
friend_count_b	INTEGER	... who are bots
friend_count_h	INTEGER	... who are humans
avg_toxicity	FLOAT	average "toxicity" score from the Detoxify model
avg_severe_toxicity	FLOAT	average "sever toxicity" score from the Detoxify model
avg_insult	FLOAT	average "insult" score from the Detoxify model
avg_obscene	FLOAT	average "obscene" score from the Detoxify model
avg_threat	FLOAT	average "threat" score from the Detoxify model
avg_identity_hate	FLOAT	average "identity hate" score from the Detoxify model
urls_shared_count (TODO)	INTEGER	number of tweets with URLs in them (TODO)
fact_scored_count	INTEGER	number of tweets with URL domains that we have rankings for
avg_fact_score	FLOAT	average fact score of links shared (1: fake news, 5: mainstream media)