Last active
December 27, 2015 12:49
-
-
Save rhhackett/7328934 to your computer and use it in GitHub Desktop.
Program to sample randomly from White House visits
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Below is a tool Julie and I (Bob) created to randomly select | |
# a group of visitors of some specified size from the some | |
# subset of visitors | |
randomvisit = function(data,lo=5,hi=10){ | |
# lo represents the low value; hi represents the high value. | |
# So, above we are looking at groups ranging from 5-10 visitors. | |
u = table(data$UIN) | |
# u refers to a table containing the UINs of some | |
# subset of visitors that will be specified through "data" | |
su = u[u <= hi & u >= lo] | |
nu = names(su) | |
# su refers to a subset of table u that contains | |
# groups of a certain size, where the size of | |
# a group is less than or equal to "hi" or | |
# greater than or equal to "lo" (limits that are defined above). | |
# nu will spit out the UINs contained in su | |
v = data[data$UIN==sample(nu,1),] | |
return(v) | |
} | |
# here we're looking at the UINs contained in some subset of visitors | |
# (to be defined through "data"). the sample function will randomly | |
# select a single (aka ",1") group of visitors (or UINs defined through nu). | |
# return(v) prints that selection for our reading pleasure. | |
x = randomvisit(csvisits) | |
x[,c(1:4,12,20:22,27,30)] | |
# x refers to a random visit selection from a dataset | |
# we created: "csvisits", which is simply a combination | |
# of house and senate visitors to potus. | |
# the next piece defines the information we want to see | |
# from a randomly selected visit. So we just kept interesting | |
# information like who came, when, to see who... (we don't | |
# care who picked them up). | |
visits$cs = visits$fullname %in% c(snms,cnms) | |
ucs = table(visits$UIN,visits$cs) | |
p = ucs[,2]/(ucs[,2]+ucs[,1]) | |
p = p[p>.25] | |
csvisits = visits[visits$UIN %in% names(p),] | |
# line 47 creates a new column under visits that will | |
# pull together the full names of visitors who are | |
# eaither in the house or the senate | |
# ucs refers to a tabulation of the house and | |
# senate visitors with their UINs | |
# p is the percentage of house and senate vistors | |
# in a given set of vistors. | |
# we then define p to be a subset of these visits where | |
# more than a quarter of the visitors are congressional | |
# finally, csvisits enables us to see the UINs of the | |
# members of such visits, those where more than a quarter | |
# of the visitors are congressional. | |
# as this is my first code annotation, apologies | |
# for anything that may be unclear.... | |
# feel to ask me qs. i'll probably cc mark. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment