Skip to content

Instantly share code, notes, and snippets.

View nchelaru's full-sized avatar

Nancy Chelaru nchelaru

View GitHub Profile
@nchelaru
nchelaru / get_latest_file.py
Created July 9, 2020 02:45
To get the latest version of a file within a directory.
import glob
import os
list_of_files = glob.glob('/path/to/folder/filename_*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
print latest_file
@nchelaru
nchelaru / k-nearest-neighbours-classification.ipynb
Created March 2, 2020 16:42
k-Nearest Neighbours Classification
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nchelaru
nchelaru / google_locations.R
Created February 4, 2020 14:52
A snippet for getting the geographical coordinates and addresses from business names using Google Maps API, in R
# listings$lat <- 0
# listings$lng <- 0
# listings$address <- 0
#
# for (i in 1:nrow(listings)) {
# row <- listings[i,]
# company_name <- gsub(' ', '\\+', row$company)
# location <- gsub(' ', '\\+', row$location)
#
# url <- sprintf('https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=%s+%s&inputtype=textquery&fields=geometry,formatted_address&key=<API_key>', company_name, location)
@nchelaru
nchelaru / pair_grids.ipynb
Created January 30, 2020 17:28
pair_grids.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nchelaru
nchelaru / lr_collinearity.ipynb
Created January 29, 2020 17:18
lr_collinearity.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nchelaru
nchelaru / lr_test_linear_relationship.ipynb
Last active January 29, 2020 15:04
lr_test_linear_relationship.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nchelaru
nchelaru / identify_regression_outliers.ipynb
Last active January 22, 2020 16:56
identify_regression_outliers.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Multicollinearity in regression must be addressed — variables should be removed until the multicollinearity is gone.

Multicollinearity is not such a problem for trees, clustering and nearest-neighbours methods. In these methods, it may be advisable to retain p dummy variables. However, even in these methods, non-redundancy in predictor variables is still desired.

@nchelaru
nchelaru / grid_search.py
Created January 12, 2020 20:47
Grid search
## Import
from sklearn.grid_search import GridSearchCV
## Define grid
param_grid = {‘polynomialfeatures__degree’: np.arrange(21),
‘linearregression__fit_intercept’: [True, False],
‘linearregression_normalize’: [True, False]}
## Grid search
grid = GridSearchCV(PolynomialRegression(), param_grid, cv=7)
@nchelaru
nchelaru / google_cse.py
Created November 22, 2019 16:53
Set up Google Custom Search Engine
## Import libraries
from googleapiclient.discovery import build
## Set credentials
my_api_key = "API_key"
my_cse_id = "CSE_ID"
## Define function
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)