Skip to content

Instantly share code, notes, and snippets.

@ryanburge
Created November 25, 2023 15:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryanburge/b8f52ae12c60507376de338307bbb5be to your computer and use it in GitHub Desktop.
Save ryanburge/b8f52ae12c60507376de338307bbb5be to your computer and use it in GitHub Desktop.
Coding Assignment #2 - Fall 2023
## Packages ##
library(socsci)
library(car)
install.packages("jtools")
library(jtools)
install.packages("interactions")
library(interactions)
REGRESSION TUTORIAL: https://rpubs.com/ryanburge/reg_interact
## Maxwell Abilla
cces <- read_csv("https://raw.githubusercontent.com/ryanburge/cces/master/CCES%20for%20Methods/small_cces.csv")
And the codebook is at this link:
https://github.com/ryanburge/cces/raw/master/Codebooks/CCES_2016_Small_Codebook.pdf
Q1 - How old is the oldest female Asian in the dataset? How many African American men are below the age of 20 in the data?
Q2 - What is the relationship between age and identifying as a Republican look like for men and women? Visualize a line graph and describe the results.
Q3 - What is the relationship between level of education and church attendance in the data? Visualize and describe.
Q4 - Run a regression with support for gay marriage as the DV (logit). Choose three IVs. Run the regression and interpret the results.
Q5 - Run a two way interaction with favoring an assault weapons ban as the DV (logit.) Use church attendance and pid3 as the two interactive variables. Visualize and interpret.
For a logit, your basic setup looks like this:
lm(DV ~ IV + IV + IV, data = DF, family = "binomial")
## Prince Brenya
gss <- read_csv("https://raw.githubusercontent.com/ryanburge/ct/master/gss_small.csv", guess_max = 25000)
Q1 - How many black women identified as a Strong Democrat in 2000? How many white men identified as a strong Republican in 2018?
Q2 - How has the church attendance of white men vs white women changed over time? Visualize the results and interpret them in words.
Q3 - What's the relationship between mother's level of education and father's level of education? Visualize the relationship and interpret them in words.
Q4 - Run a regression model with church attendance as the DV. You can choose three different IVs that could impact the DV. Run the model - interpret the results.
Q5 - Run a two way interaction model with church attendance as the DV. Your two IVS are age and gender. Visualize the model. Interpret the results.
## Jasmin Chavolla
## Load Data ###
bball <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/bball.csv", guess_max = 25000) %>%
rename(games = g, wins =w, losses = l, runs =r, atbats = ab, hits =h, homeruns =hr, walks = bb, strikeout = so, stolenbase = sb, earnedruns = era)
Q1 - What team scored the most runs in the National League in 2001? What team scored the most runs in the American League?
Q2 - I want you to visualize the number of homeruns hit in the National League and the American League over time. Which year saw the highest number of home runs hit?
Q3 - I want you to visualize the relationship between the number of triples hit and the number of strikeouts. Describe that relationship to me.
Now run the following lines of code:
options(scipen = 999)
house <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/house.csv")
Q4 - The data set above is home sales data in Seattle, Washington. I want you to put together a simple regression where the DV is home price. I want you to choose three different IVs. Run the regression and tell me how each of those independent variables impacts home price.
Q5 - I want you to run a two way interaction regression where the predictor variables are square feet of living space and whether the home is on the waterfront. The DV will be home price. You must visualize the output of this.
## Millicent Danso
gss <- read_csv("https://raw.githubusercontent.com/ryanburge/ct/master/gss_small.csv", guess_max = 25000)
Q1 - How many people over the age of 65 had at least a four year college degree in 2016?
Q2 - How has the average age of men and women changed over the course of the GSS? You must visualize this in a graph.
Q3 - Visualize the relationship between education and age. Describe that relationship with words as well.
Q4 - Create a regression model with church attendance as the DV. Choose three independent variables that might have an impact on the DV. Run a regression and interpret the output.
Q5 - Run a two way interaction model with education as the DV. Age and Gender will be your two independent interaction variables. Visualize the output of this.
## Hannah Djabaku
cces <- read_csv("https://raw.githubusercontent.com/ryanburge/cces/master/CCES%20for%20Methods/small_cces.csv")
Q1 - What is the age of the oldest female Latter-day Saint in the data?
Q2 - What is the relationship between educational attainment and religious service attendance? Visualize a line graph, describe the results.
Q3 - What is the relationship between political ideology and income in the data? Visualize and describe.
Q4 - Run a regression with support for deporting all illegal immigrants as the dependent variable (logit). Choose three IVs. Run the regression and interpret the results.
Q5 - Run a two way interaction with favoring an assault weapons ban as the DV (logit.) Use church attendance and white/non-white as the two interactive variables. Visualize and interpret.
## Daniel Hooker
## Load Data ###
bball <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/bball.csv", guess_max = 25000) %>%
rename(games = g, wins =w, losses = l, runs =r, atbats = ab, hits =h, homeruns =hr, walks = bb, strikeout = so, stolenbase = sb, earnedruns = era)
Q1 - What team drew the most walks in the National League in 2005? What team struck out the most in the American League in 2003?
Q2 - How has the number of strike outs changed in baseball over the time period in the data? How about the number of home runs. Plot them on one graph. Interpret the results.
Q3 - What is the relationship between team salary and number of hits? Does that look different based on league? Visualize and and interpret.
Q4 - Run a regression model with number of losses as the DV. Choose three IVs that could impact the DV. Run the model. Interpret the results.
Q5 - Run a two way interaction with number of walks as the DV. The two interactive variables are number of hit by pitches (hbp) and league id. Visualize and interpret the results.
## Morgan Rigdon
## Load Data ###
bball <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/bball.csv", guess_max = 25000) %>%
rename(games = g, wins =w, losses = l, runs =r, atbats = ab, hits =h, homeruns =hr, walks = bb, strikeout = so, stolenbase = sb, earnedruns = era)
Q1 - Which team drew the most walks in the National League in 2001? What team stole the most bases in the American League in 1998?
Q2 - Visualize how many games each league has won over the entire time frame in the data. Describe what you see in words.
Q3 - Visualize the relationship between hit by pitches (hbp) and walks. Describe what you see in words.
Now run the following lines of code:
options(scipen = 999)
house <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/house.csv")
Q4 - Run a regression model with sq feet of living space as the DV. Choose three variables as IVs. Run the regression. Interpret the output.
Q5 - Run a twoway interaction with home sales price as the DV. The two IVs are yr_built and number of bedrooms (Create a variable where there are two categories - 2 bedrooms or less and 3 bedrooms or more). Visualize and interpret your results.
## Lois Tetteh
options(scipen = 999)
house <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/house.csv")
Q1 - What is the most expensive house in the dataset that was built prior to 1950?
Q2 - What is the relationship between year built and the total sq ft of living space in the dataset? Visuualize and describe.
Q3 - What is the relationship between number of bedrooms in a house and number of bathrooms? Visualize and describe.
Q4 - Run a regression with home price as the dependent variable. Choose three independent variables. Visualize and interpret your results.
Q5 - Run an interaction with home price as the dependent variable and your interaction variables are year built and number of floors. Also include two more independent variables. Visualize and interpret your results.
## Thomas Wallace
gss <- read_csv("https://raw.githubusercontent.com/ryanburge/ct/master/gss_small.csv", guess_max = 25000)
Q1 - What is the age of the most highly educated black female in the data in 1972?
Q2 - What is the relationship an individuals level of education and their spouses level of education in the data in 2012? Visualize and interpret.
Q3 - What is the relationship between a mothers education and fathers education in the 1980s? Did that change in the 1990s? Visualize and interpret.
Q4 - Run a regression with religious service attendance as the dependent variable. Choose three independent variables. Visualize and interpret.
Q5 - Run a two way interaction with education as the dependent variable. The two interaction variables are age and white/non-white. Also include two other independent variables. Visualize and interpret your results.
## Mckenna Wojcicki
bball <- read_csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/bball.csv", guess_max = 25000) %>%
rename(games = g, wins =w, losses = l, runs =r, atbats = ab, hits =h, homeruns =hr, walks = bb, strikeout = so, stolenbase = sb, earnedruns = era)
Q1 - What team in the National League won the fewest games between 2000 and 2005?
Q2 - What is the relationship between salary and ballpark attendance in the American League in 2002? Visualize and intepret.
Q3 - What is the relationship between wins and strikeouts in the entire dataset? Visualize and interpret.
Q4 - Run a regression with wins as the dependent variable. Include three independent variables. Visualize and interpret your results.
Q5 - Run a twoway interaction with ballpark attendance as your DV. The interactive terms are league_id and wins. Also include two other independent variables. Visualize and interpret your results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment