Skip to content

Instantly share code, notes, and snippets.

View codegordi's full-sized avatar

C M Gutierrez codegordi

View GitHub Profile
@codegordi
codegordi / srapeshell.R
Created September 27, 2012 16:53 — forked from abelsonlive/srapeshell.R
# best practices for web scraping in R // ldply
# best practices for web scraping in R #
# function should be used with ldply
# eg:
ldply(urls, scrape)
# add a try to ignore broken links/ unresponsive pages
# eg:
@codegordi
codegordi / convertCoordinates
Created October 5, 2012 21:48
a simple coordinate conversion function
### a simple coordinate conversion function to convert latitude, longitude in DMS to decimal-degress
### christina gutierrez aka github@codegordi
#library(gsubfn)
require(gsubfn)
options(digits=15)
# point.set == data set with at least latitute and longitude in degrees-(arc)minutes-seconds[-fractional seconds];
# no projection or datum assumed;
# format of DD[D]:MM:SS[*] is assumed
# e.g. coords <- c("122:45:45", "-69:38:27")
# e.g coords <- c("122:45:45.84", "-69:38:27.76")
@codegordi
codegordi / write.to.read.from.LIST
Created May 31, 2013 17:51
Read to / write from list - from / to flat file
# use plyr package
library(plyr)
### READ all files in working directory into a list of dataframes
d.list=lapply(list.files(getwd(), full = F), FUN = read.table, header=T, sep="\t", fill=T) # name data
### SEPARATE out data frames from list -- using a for-loop in this example
i=0
f.coll=list.files()
@codegordi
codegordi / RJDBC_example
Created May 31, 2013 18:07
Example using RJDBC to connect to a MS SQL Server database (R on MacOS)
# JDBC driver library for R
library("RJDBC")
user <- "cgutierrez" # enter your own username
#pwd <- # enter in URL str below
# I chose open-source JTDS for a particular reason in this example (not related to R); you might choose another one
# Check that appropriate .jar is in indicated directory
drv <- JDBC("net.sourceforge.jtds.jdbc.Driver", "/usr/share/java/drivers/jdbc/jtds-1.2.7.jar", identifier.quote="`")
cxn <- dbConnect(drv, "jdbc:jtds:sqlserver://MYSERVER01ABC;useNTLMv2=true;domain=MY.DOMAIN.COM", user, #####) # enter pwd at runtime
@codegordi
codegordi / R_multicore_example
Last active December 17, 2015 23:08
Use multicore package (R on MacOS) to grep a (Very Large Data) file-as-dataframe.
### manage memory on a large data set using multicore library
library(multicore)
# read in tab-delimited file from working dir
df = read.table(getwd(), sep="\t", header=T)
d.lines = as.character(df$charvar) # $charvar is character-class variable you want to grep
grep_wrap <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) {
ret = rep(0, length(x))
ret[grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)] = 1
@codegordi
codegordi / Rneo4j
Created June 4, 2013 13:16
Connect to neo4j and send Cypher query (R on MacOS)
library('bitops')
library('RCurl')
library('RJSONIO')
query = function(querystring) {
h = basicTextGatherer()
curlPerform(url="http://<your host IP>/db/data/ext/CypherPlugin/graphdb/execute_query",
postfields=paste('query',curlEscape(querystring), sep='='),
writefunction = h$update,
verbose = FALSE
@codegordi
codegordi / cumRelFreqDistn.py
Created October 21, 2013 20:50
Python function to calculate cumulative relative frequency distribution (for contexts where numpy/scipy/etc not available, e.g. in Pig pre-v.0.12). Originally designed to work as a User Defined Function for Pig on Hadoop.
def cumRelFreqDistn(tups):
# create bins of increment 0.01
a = [i*-0.01 for i in range(100)]
a = a[1:len(a)]
b = [i*0.01 for i in range(101)]
a.extend(b)
a.sort()
bins = a
@codegordi
codegordi / filegdb2shp
Last active August 29, 2015 14:03 — forked from gislars/filegdb2shp
1) Get the FileGDB API http://www.esri.com/apps/products/download/
2) Extract it somewhere on your system and remember the path :)
3) Do:
> mkdir build #directory where we are playing around
> cd build
> git clone https://github.com/OSGeo/gdal.git
> cd gdal
> ./configure --with-fgdb=/path/to/your/FileGDB_API
# alteRyx_install_packages.R
# > code to install packages via R Developer tool in Alteryx(R) module
# > note call to custom Alteryx-R function wrte.Alteryx()
altx.repo <- getOption("repos")
altx.repo["CRAN"] <- "http://cran.rstudio.com" # set your primary repo if you haven't already
options(repos = altx.repo)
#write.Alteryx(getOption("repos"), 1) # DEBUG
install.packages("XML")
## install Haskell Platform on Mac OS
(1) Download from https://www.haskell.org/platform/
> Contains the GHCi REPL shell (console-interpreter), common vetted libraries, Cabal, etc.
> See https://www.haskell.org/platform/doc/2014.2.0.0/start.html for more information
(2) Um ... that should be it on Mavericks (v. 10.9.+). You may run into problems if you had GHC/i installed already on your Mac and then upgraded to Mavericks. Look up @cartazio or @katchuang gists in these cases.
(3) Do you a Haskell for a Great Good!