Skip to content

Instantly share code, notes, and snippets.

@marionhalftermeyer
Created January 21, 2014 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save marionhalftermeyer/8541486 to your computer and use it in GitHub Desktop.
Save marionhalftermeyer/8541486 to your computer and use it in GitHub Desktop.
Data Visualization II Assignment 2: Julie Klein, Robert Hackett & Marion Halftermeyer
1) Focus:
Our group decided to focus on the extinction of words in Congressional records. Our project had three key steps. First, we defined the term extinction. Second, we developed a methodology for identifying words that have gone extinct. Finally, we investigated the extinction of one word in particular. We ultimately decided to investigate the extinction of cloning (along with variations of cloning such as clone, cloned and clones). While many words that have gone extinct are somewhat self-explanatory, such as proper nouns that refer to specific past events, we were surprised by the extinction of the word cloning given that it is an ongoing topic of debate in public discourse today. This seeming disconnect between the use of the word in Congressional records and the use of the word in everyday language is what led us to focus on cloning.
2) Code:
# To start, we looked at the most commonly used word in each state
# For example, here is how we found the most commonly used words for Alabama
topAL = sll_cw_phrases(entity_type = "state", entity_value="AL",key="61f1f72ef4c34d9187fcaea875b6050c")
topAL = c(topAL$ngram)
# We then cross referenced our findings with the results of the function created with Professor Hansen
extinct = function(data,lo=0,hi=0){
start = as.Date("1997-01-01")
end = as.Date("2014-01-01")
months = seq(start,end,by="month")
data = data[data$month >= start,]
fulldata = rep(0,length(months))
fulldata[match(data$month,months)] = data$percentage
his = which(fulldata > hi)
los = which(fulldata <= lo)
durationhi = diff(los)-1
maxdhi = max(durationhi)
starthi = months[los[which(durationhi == maxdhi)[1]]+1]
valhi = fulldata[los[which(durationhi == maxdhi)[1]]+1]
durationlo = diff(his)-1
maxdlo = max(durationlo)
startlo = months[his[which(durationlo == maxdlo)[1]]+1]
vallo = fulldata[his[which(durationlo == maxdlo)[1]]+1]
if(starthi < startlo) hifirst = TRUE
else hifirst = FALSE
return(data.frame(starthi=starthi,maxdhi=maxdhi,valhi=valhi,startlo=startlo,maxdlo=maxdlo,vallo=vallo,hifirst=hifirst))
}
extinctwords = function(words,lo=0,hi=0,key){
durations = data.frame()
for(w in words){
newcounts = sll_cw_timeseries(w,granularity="month",key=key,percentages=TRUE)
newdurations = extinct(newcounts,lo=lo,hi=hi)
newdurations$word = w
print(newdurations)
durations = rbind(durations,newdurations)
}
return(durations)
}
# We went through these lists of top words to identify those that may be of interest to us
# We then used the time series function to see the use of these words over time
# For example, we created a plot of clone (and variations of the word clone) over time to visualize its extinction
# We first plotted the word clone
plot(ext$day,ext$count,type="l",xlab="Day",ylab="Count",main="Extinction of cloning: clone/black, cloned/red, cloning/blue, clones/green", cex.main = 1)
plot(ext$day,ext$count,type="l",xlab="Day",ylab="Count",main="Extinction of cloning: clone (black), cloned (red), cloning (blue), clones (green)", cex.main = 0.75)
# And then we added additional lines for the other variations of the word
ext2 = sll_cw_timeseries("cloned",key="61f1f72ef4c34d9187fcaea875b6050c")
lines(ext2$day,ext2$count,type="l",col="red")
ext3 = sll_cw_timeseries("cloning",key="61f1f72ef4c34d9187fcaea875b6050c")
lines(ext3$day,ext3$count,type="l",col="blue")
ext4 = sll_cw_timeseries("clones",key="61f1f72ef4c34d9187fcaea875b6050c")
lines(ext4$day,ext4$count,type="l",col="green")
# Next, we researched which Congressmen were using the word clone
# We repeated this exercise for the variations of the word as well
columns = c(1,2,9,14,15,16,17,19)
clonepeople = clone[,columns]
columns2 = c(2,19)
clonepeoplenames = clone[,columns2]
# We further investigated the use of the word by specific individuals, such as Brownback
clone_brownback = clone[clone$speaker_last == "Brownback",columns]
clone_brownback
# We also researched which bills referenced cloning
#nyt_cg_billscosponsor
# sll_cg_getcommitteesallleg (gets a list of committees that a member serves on)
# sll_cg_getlegislator (gets information about legislator)
#2003
sll_cg_getlegislator(lastname="Sensenbrenner", key="f54f453f73a84c92b344735b8be45ac6")
sll_cg_getcommitteesallleg(bioguide_id="S000244", key="f54f453f73a84c92b344735b8be45ac6")
# F. "Jim" Sensenbrenner Jr. said "cloned"
# Party: R
# Role: Representative in House
# State: WI
# Committees: "House Committee on the Judiciary", "Crime, Terrorism, and Homeland Security Subcommittee",
# "Intellectual Property, Competition, and the Internet Subcommittee", "House Committee on Science, Space, and Technology"
# "Space and Aeronautics Subcommittee", "Investigations and Oversight Subcommittee"
# Bioguide ID: S000244
# In office: TRUE
sensenbrennercloned= sll_cw_text(phrase="cloned", bioguide_id= "S000244", key="f54f453f73a84c92b344735b8be45ac6")
# Bills:
# Human Cloning Prohibition Act of 2001 (H.R. 2505) 2001-07-31
# Human Cloning Prohibition Act of 2003 (H.R. 534) 2003-02-27
3) Memo:
As mentioned above, we ultimately focused our research on the extinction of the word cloning (and its variations).
As demonstrated by this plot, cloning was a popular term in Congressional records until 2008, at which point the word nearly disappeared from the records. We chose to focus on cloning because, to our knowledge, the public debates surrounding cloning are still ongoing, making it surprising that the word disappeared from the Congressional vocabulary.
In order to gain an understanding of why cloning became extinct from a Congressional perspective, we researched which Congressmen talked about cloning throughout the data set and which bills referenced cloning. First, we created a list of Congressmen who discussed cloning most frequently and/or most recently. Sam Brownback, a Republican representative from Kansas, discussed cloning most frequently over time, while Barbara Mikulski, a Democratic senator from Maryland, was one of the last people to discuss cloning in 2008 before the word became extinct. Other Congressmen at the top of the list included Cliff Stearns (R, FL), Diana DeGette (D, CO, Joe Pitts (R, PA), Marsha Blackburn (R, TN), Dianne Feinstein (D, CA), David Weldon (R, FL) and F. Sensenbrenner (R, WI). With regard to bills about cloning, we decided to research the Human Cloning Prohibition Act, which was discussed by many of these Congressmen between 2001 and 2007. This bill initially passed the House in 2001 but was never passed in the Senate. It was repeatedly reintroduced by some of the members of our lists or with them as cosponsors in 2003, 2005, and 2007 but died in the committee each time. The fluctuation of the bills presence in congress follows that the uses of the words clone, clones, cloning and cloned. Though the bill was recently reintroduced in May 2013, usages of the words are not present in the congressional data.
When we began this project, we wanted to figure out what causes a word to disappear from Congressional records. With regard to cloning, our research leads us to conclude that extinction is correlated with the death of a bill. Thus, while cloning may have remained a topic of discussion in public discourse after 2008, it was no longer connected to a live bill, leading to its disappearance from the Congressional record at this time. Given that this bill was reintroduced this summer, we expect that the word will be used more frequently once again, assuming that the bill progresses.
In order to further develop this story, we would like to conduct similar research on other words that have gone extinct. In particular we would like to confirm if extinction of a word is always tied to the death of relevant bills or if there are alternate explanations for extinction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment