Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Practising some data shaping in R for #compdata week 1
# compdata week 1 pracitce
# Script reads a NodeXL twitter search for #compdata hashtag that's been uploaded to Google Spreadsheet
# Data is reshaped using subsetting to get a slice of rows columns fitting a certiain condition
# read csv from Google Spreadsheet, headers in row 2 in this case an vertices list
vertices <- read.csv("",header=TRUE,skip=1,)
# see number of rows
# read csv from Google Spreadsheet, headers in row 2 in this case an edges list
edges <- read.csv("",header=TRUE,skip=1,)
# look at the data
# Note that $ Relationship : Factor w/ 4 levels "Followed","Mentions"
# What are all the levels in $Relationship
# how many rows are there where $Tweet that contains 'I just signed up for Computing for Data Analysis .. '
iJust <- grepl("^I just signed up for Computing for Data Analysis", edges$Tweet)
# Want to get a subset of data of $Vertex.1 and $Vertex.2 where $Relationship is 'Followed'
# To get 'Followed' subset
followed <- edges$Relationship == "Followed"
# now make a new data.frame with 1st two cols of edges $Vertex.1 and $Vertex.2 where followed
edgeList <- edges[followed,1:2]
# lines 10 and 13 can be combined using
edgeList <- edges[edges$Relationship == "Followed",1:2]
# look at the new data
# Now look at most frequent occurences of $Vertex.1 values from edges
# table will give us a frquency table
topInVert1 <-data.frame(table(edges$Vertex.1))
# now we can change the order
topInVert1 <- topInVert1[order(-topInVert1$Freq), ]
#print the top 10 results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment