This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#add these libraries | |
library(lsr) | |
library(dplyr) | |
#the code below limits the dataset to the 2014 survey data and creates a new dataset named GSS2014 | |
GSS2014<-dplyr::filter(GSS, year==2014) | |
#get different confidence intervals for the same variable | |
ciMean(GSS2014$tvhours, na.rm=TRUE, conf =0.90) | |
ciMean(GSS2014$tvhours, na.rm=TRUE, conf =0.95) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GSS2014$tvhours[GSS2014$tvhours >24]<-NA | |
#Look at the mean of an interval variable | |
mean(GSS2014$tvhours, na.rm=TRUE) | |
# This is a generalized linear model of an interval variable with just an intercept, no independent variable | |
results<-glm(GSS2014$tvhours~1, data=GSS2014) | |
summary(results) | |
#create a dichotomous variable coded 0, 1 (or TRUE, FALSE) | |
GSS2014$nochild <-as.numeric(GSS2014$childs) <= 0 | |
#Look at the proportion of 1 (TRUE) values of the dichotomous variable coded 0, 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#create new temporary dataset that filters out the missing (NA) values so that they don't appear in the graph | |
polviewsnew<-dplyr::filter(GSS2014, polviews!='NA') | |
#create a bar graph using the new temporary dataset | |
ggplot(polviewsnew, aes(x=polviews)) + | |
geom_bar(stat ="count", color="red", fill="white", aes(y = ((..count..)/sum(..count..)))) + | |
ggtitle("Political Views") + | |
labs(y="Percent", x="Political Views") + scale_y_continuous(labels=percent) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**CHUNK 1 STARTS BELOW THIS LINE. | |
```{r} | |
#YOU WILL ALWAYS NEED THIS FIRST CHUNK. WE WILL ADD TO IT DURING THE SEMESTER. | |
#THIS CHUNK LOADS THE LIBRARIES AND DATA THAT YOU NEED FOR YOUR WORK. | |
library(aws.s3) | |
library('lehmansociology') | |
s3load('gss.Rda', bucket = 'lehmansociologydata') | |
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TO DO A GRAPH IN R STUDIO, YOU FIRST NEED TO ADD A NEW LIBRARY USING THE LINE OF CODE BELOW. | |
library('ggplot2') | |
(YOU SHOULD PASTE THIS LINE OF CODE INTO CHUNK 1 ON IT'S OWN LINE NEAR THE OTHER LINES OF CODE THAT START WITH library) | |
(DON'T FORGET TO RUN CHUNK 1 AFTER PASTING SO THAT IT LOADS THE NEW LIBRARY FOR YOU.) | |
AFTER YOU'VE ADDED THE LIBRARY ggplot2, YOU CAN BEGIN TO USE THE COMMAND ggplot TO CREATE GRAPHS. | |
ALWAYS CONSIDER WHICH GRAPH IS APPROPRIATE FOR YOUR VARIABLE BASED ON LEVEL OF MEASUREMENT. | |
ALSO CONSIDER WHICH GRAPH WILL DISPLAY THE INFORMATION CLEARLY GIVEN THE VARIABLE'S VALUES. | |
BE SURE TO ALWAYS HAVE AXIS LABELS AND TITLES ON YOUR GRAPH THAT ARE CLEAR, ACCURATE, AND DESCRIBE THE GRAPH. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```{r} | |
summary(GSS$childs) | |
IQR(GSS$childs) | |
var(GSS$childs) | |
sd(GSS$childs) | |
summary(GSS$chldidel) | |
IQR(GSS$chldidel, na.rm=TRUE) | |
var(GSS$chldidel, na.rm=TRUE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Recall from the swirl lesson called "Working with Variables" that if you have missing values R will give you a | |
value of NA when you ask for specific statistics, such as the mean and sd. Therefore, the first two lines of code | |
in the chunk below add na.rm=TRUE which tells R that those are missing values that should be ignored to compute the mean and sd. | |
```{r} | |
mean(GSS$tvhours, na.rm=TRUE) | |
sd(GSS$tvhours, na.rm=TRUE) | |
#note the added code section for geom_vline in the histogram below. Add title and labels. | |
ggplot_tvhours <-ggplot(GSS, aes(tvhours)) | |
ggplot_tvhours + geom_histogram(binwidth =1, aes(y=(..count../sum(..count..))*100)) + |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For this lab, you need to add new libraries. I recommend you put them in Chunk 1 with your other library code. | |
library(lsr) | |
library(dplyr) | |
#get different confidence intervals for the same variable | |
mean(GSS$tvhours, na.rm=TRUE) | |
ciMean(GSS$tvhours, na.rm=TRUE, conf =0.90) | |
ciMean(GSS$tvhours, na.rm=TRUE, conf =0.95) | |
ciMean(GSS$tvhours, na.rm=TRUE, conf=0.99) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```{r} | |
#Look at the mean of an interval variable | |
mean(GSS$childs, na.rm=TRUE) | |
# This is a generalized linear model of an interval variable with just an intercept, no independent variable | |
results<-glm(GSS$childs~1, data=GSS) | |
summary(results) | |
# This is a generalized linear model of an interval variable with an intercept and an independent variable | |
results<-glm(GSS$childs~age, data=GSS) | |
summary(results) | |
#The line below gives us a scatterplot with a "best fitting line" through it. SEE IF YOU CAN ADD TITLE AND LABELS. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Look at the mean of an interval variable | |
mean(GSS$childs, na.rm=TRUE) | |
# This is a generalized linear model of an interval variable with just an intercept, no independent variable | |
regchilds<-glm(GSS$childs~1, data=GSS) | |
summary(regchilds) | |
# This is a generalized linear model of an interval variable with an intercept and one interval-ratio independent variable | |
regchilds2<-glm(GSS$childs~age, data=GSS) | |
summary(regchilds2) |
OlderNewer