Skip to content

Instantly share code, notes, and snippets.

@kenthzhang
Created April 28, 2017 21:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kenthzhang/a60c949fa60256530a379644524af093 to your computer and use it in GitHub Desktop.
Save kenthzhang/a60c949fa60256530a379644524af093 to your computer and use it in GitHub Desktop.
cc <- rxSparkConnect(reset = TRUE)
hdfsFileSystem <- RxHdfsFileSystem()
colInfo <- list(
DayOfWeek = list(
type = "factor",
levels = c(
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday")))
orcData <- RxOrcData(file = "/share/AirlineDemoSmall/AirlineDemoSmallOrc", fileSystem = hdfsFileSystem, colInfo = colInfo)
.LinMod <- function(keys, data)
{
rxLinMod(ArrDelay ~ CRSDepTime, data)
}
result <- rxExecBy(inData = orcData, keys = c("DayOfWeek"), func = .LinMod)
result[[1]]
#$keys
#$keys[[1]]
#[1] Wednesday
#Levels: Monday Tuesday Wednesday Thursday Friday Saturday Sunday
#
#
#$result
#Call:
#rxLinMod(formula = ArrDelay ~ CRSDepTime, data = data)
#
##Linear Regression Results for: ArrDelay ~ CRSDepTime
#Data: data (RxXdfData Data Source)
#File name: /dev/shm/MRS-sshuser/3108799476135290757/PXDF0
#Dependent variable(s): ArrDelay
#Total independent variables: 2
#Number of valid observations: 76786
#Number of missing observations: 2089
#
#Coefficients:
# ArrDelay
#(Intercept) -3.1974586
#CRSDepTime 0.9900705
#
#$status
#$status[[1]]
#[1] "OK"
#
#$status[[2]]
#NULL
#
#$status[[3]]
#NULL
rxSparkDisconnect(cc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment