- df.dtypes : lists the type of each column in the dataframe (no parenthesis)
View PhlCrime_GettingStarted_PT_I_II.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# the following script will import Philadelphia Crime Data | |
# Parts I and II, create a summary based on year, month, and crime type | |
# and will create a basic map in leaflet using the first 1000 incidents | |
# you will need to install dplyr, leaflet, readr, lubridate, and stringr | |
# packages ( install.packages('package name') ) | |
rm(list = ls()) | |
library(dplyr) | |
library(leaflet) | |
library(readr) |
View replaceValuesWithKeys.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
s = ['c' ,'is', 'equal', 'to', 'b'] | |
print(s) | |
# output >> ['c', 'is', 'equal', 'to', 'b'] | |
# dictionary of names:values | |
d = {'joe':['a', 'b'], 'tom':['c', 'd']} | |
# replace any values from the dict with the key value | |
for i in range(0, len(s)): | |
for key,value in d.items(): | |
for v in value: |
View PHL_Crime_By_District.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##################### Import Libraries ############################################## | |
# if you don't have the libraries below you can install the library using the | |
# following command : install.packages('package_name_here_in_quotes') | |
library(dplyr) | |
library(lubridate) | |
library(RColorBrewer) | |
library(leaflet) | |
library(stringr) | |
library(rgdal) |
View deepAssignmentOperatorExamples.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#################################################### | |
# case of global assignment | |
# parent environment is the global environment | |
# from Hadley Wickham's 'Advanced R' | |
##################################################### | |
x <- 0 | |
f <- function() { | |
x <<- 1 | |
} |
View group_arrange_assign_ranking.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ds %>% group_by(group1, group2) %>% | |
summarise( | |
summary_value = some_function | |
) %>% arrange(desc(summary_value)) %>% group_by(group1) %>% | |
mutate(rank=row_number()) | |
View python_reference.md
View Stat_notes.md
- Test for normality:
- Shapiro-Wilk: Null Hypothesis is that the data is normally distributed. If p-value below alpha (0.05 or whatever significance you are looking for), null hypothesis is rejected (data is non-normal)
- When testing with large samples (test is biased by sample size - will be statistically significant at large sample size) accompany test with a Q-Q plot
- Anderson-Darling
- Comparison on distributions (no assumption of normality)
- Kolmogorov-Smirnov test
- Compares CDF's of two sample sets - D value close to 1 indicates distributions are different, close to 0 distributions are close to one another
- Wilcoxon’s signed-rank test
- Compares medians from two sample sets
- Kolmogorov-Smirnov test
- Mann-Whitney U Test: Similar to Wilcoxon, but samples don't have to be paired
View gensim_notes.md
- save_as_text : don't use this unless you just want to read the text in the file. Otherwise it will cause issues if you want to go back later and revise/filter the dictionary
- If you choose to import a dictionary then alter it, the corpus must also be updated as outlined here - Q8
- You have to limit the number of features in large datasets otherwise the memory consumption is huge
- This is regardless of weather the corpus is loaded in RAM or serialized
- Iterations argument - refers to the number of iterations in the EM step
View ODS.md
- CDC WONDER
- mortality data
- birth data
- environment
- population data
- Pennsylvania State Data Center
- County level data (mostly census data) for PA
- Census Data
- County Adjacency: County adjacency data from the US census bureau
- County Health Rankings
View java_compatable_regex.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(?<!@|#)\b\w+ : Remove all words starting with @ or # (remove hashtags and user handles from twitter) | |
(?<!@|#)\b\w{2,} : Same as above but only keep words with length of 2 or greater |
OlderNewer