Skip to content

Instantly share code, notes, and snippets.

@abelcallejo
Last active April 27, 2022 12:37
Show Gist options
  • Save abelcallejo/f557a7d4ce7e37bbe5306dae864d0f0f to your computer and use it in GitHub Desktop.
Save abelcallejo/f557a7d4ce7e37bbe5306dae864d0f0f to your computer and use it in GitHub Desktop.
R crash course and cheatsheets

r

Crash course and cheatsheets

Data types

In most programming languages, there are the so called "Data types". However in R, they are often referred to as Objects.

Primitive data types

In most programming languages, there are the so called "Primitive data types". However in R, they are often referred to as Atomic objects or sometimes called as "Basic types".

  • logical
  • integer
  • double
  • complex
  • character
  • raw

Functions for checking primitive data types:

  • typeof() - returns a character data type value of either "logical", "integer", "double", "complex", "character", or "raw".
  • is.atomic() - returns a logical data type value of either TRUE or FALSE.

Alternatively, some functions can also be used in supplementary for the typeof() function:

Lists

my.list <- list( "one", "two", "three" )
cat( my.list[[1]] ) # one
cat( my.list[[2]] ) # two
cat( my.list[[3]] ) # three

Command line

Parsing command line arguments in R can be very tricky. However, the GNU-based syntax seems to make this one easy. Getting the values can easily be done if you pass them using the --parameter=argument format. Consider the code below...

cli.R

# Set the valid run parameters
valid.run.parameters <- c( "universe", "character", "verbose" )

# Get the run arguments
run.arguments <- commandArgs(TRUE)

# Loop each argument if and only if there are arguments
if( length( run.arguments ) > 0 ) {
	for ( i in 1:length( run.arguments ) ) {

		# Validate if it has the --parameter=argument structure
		if ( pracma::strcmpi( substr( run.arguments[i], 1, 2 ), "--" ) & grepl( "=", run.arguments[i], fixed = TRUE) ) {

			# Identify which is the parameter or argument from the --parameter=argument pattern
			key.pair <- stringr::str_split( run.arguments[i], "=", simplify=TRUE )

			# Get the parameter from the --parameter=argument pattern
			run.parameter <- gsub( "--", "", key.pair[1] )

			# Get the argument from the --parameter=argument pattern
			run.argument <- substr( key_pair[2], 2, nchar(key_pair[2])-1 )

			# Validate if the parameter is among the valid run parameters
			if ( run.parameter %in% valid.run.parameters ) {

				# DO YOUR MAGIC HERE! Here is an example...
				cat( run.parameter, "\n" )
				cat( run.argument,  "\n\n" )

			}

		}

		# Validate if it has the --argument structure
		else if ( pracma::strcmpi( substr( run.arguments[i], 1, 2 ), "--" ) ) {
			run.argument <- gsub( "--", "", run.arguments[i] )

			# DO YOUR MAGIC HERE! Here is an example...
			cat( run.argument, "\n\n" )

		}

	}
}

Example

rscript cli.R --universe=MCU --character="Wade Wilson" --hobby=trolling --verbose

Output

universe
MCU

character
Wade Wilson

verbose

Notice that the --hobby=trolling was not processed further because hobby was not listed in the valid.run.parameters.

args <- commandArgs(TRUE)
variable_1 <- args[1]
variable_2 <- args[2]
variable_3 <- args[3]
# Reading data from a CSV
read.csv( file = 'inputs/source.data.csv',
stringsAsFactors = FALSE,
strip.white = TRUE,
sep = ','
)
# Writing data into a CSV
write.csv(sample.dataframe, file = 'outputs/destination.data.csv', row.names=FALSE)
# Minimal
my_data_frame <- data.frame(
column_A <- c("record A1","record A2"),
column_B <- c("record B1","record B2")
)
# Merge 2 data frames
merged_data_frame <- rbind(data_frame_1,data_frame_2)
# load the pre-requisite package
require("RPostgreSQL")
# initiate the database driver
driver <- dbDriver("PostgreSQL")
# connect to the database
connection <- dbConnect(driver, dbname = databasename, host = databasehost, port = databaseport, user = databaseuser, password = databasepassword)
# query results will be stored to a data frame
query <- "SELECT * FROM table_name"
results <- dbGetQuery(connection, query)
# insert query
query <- "INSERT INTO table_name (column_a) VALUES ('Value A')"
dbExecute(connection, query)
message("hello world")
# for loop
for (year in c(2010,2011,2012,2013,2014,2015)){
print(paste("The year is", year))
}
# For single-line values
message("hello world")
# For multi-dimensional values
print(your_data_frame_here)
# Detecting the number of cores and threads
library(parallel)
cores <- detectCores(logical = FALSE)
threads <- detectCores()
# Parallel processing using foreach
library(foreach)
library(doParallel)
registerDoParallel(cores=threads)
foreach (i=1:100) %dopar% {
# do something heavy
}
# For best execution time, make parallel executions equivalent to the number of `threads`
library(raster)
raster.data <- raster("/path/to/raster.tif")
ncol.data <- (xmax(raster.data) - xmin(raster.data)) / xres(raster.data)
nrow.data <- (ymax(raster.data) - ymin(raster.data)) / yres(raster.data)
# Measuring the execution runtime
start_time <- Sys.time()
# do something heavy
end_time <- Sys.time()
# The measured runtime in seconds
run_time <- end_time - start_time
# Concatenation
concatenated <- paste(string_1,string_2,string_n, sep = "")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment