Skip to content

Instantly share code, notes, and snippets.

View AllieUbisse's full-sized avatar
🎯
Focusing

Allie .S Ubisse AllieUbisse

🎯
Focusing
View GitHub Profile
@shanealynn
shanealynn / python batch geocoding.py
Last active January 6, 2024 13:48
Geocode as many addresses as you'd like with a powerful Python and Google Geocoding API combination
"""
Python script for batch geocoding of addresses using the Google Geocoding API.
This script allows for massive lists of addresses to be geocoded for free by pausing when the
geocoder hits the free rate limit set by Google (2500 per day). If you have an API key for paid
geocoding from Google, set it in the API key section.
Addresses for geocoding can be specified in a list of strings "addresses". In this script, addresses
come from a csv file with a column "Address". Adjust the code to your own requirements as needed.
After every 500 successul geocode operations, a temporary file with results is recorded in case of
script failure / loss of connection later.
Addresses and data are held in memory, so this script may need to be adjusted to process files line
@hammadzz
hammadzz / pyspark_help.md
Last active August 23, 2020 12:13
PySpark HelpSheet
@bshishov
bshishov / forecasting_metrics.py
Last active April 20, 2024 04:29
Python Numpy functions for most common forecasting metrics
import numpy as np
EPSILON = 1e-10
def _error(actual: np.ndarray, predicted: np.ndarray):
""" Simple error """
return actual - predicted
'''
spark/bin/spark-submit \
--master local --driver-memory 4g \
--num-executors 2 --executor-memory 4g \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
sstreaming-spark-final.py
'''
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import expr
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, explained_variance_score
import mlflow
import mlflow.sklearn
import numpy as np
# Launch the experiment on mlflow
experiment_name = "electricityconsumption-forecast"
@liorshk
liorshk / mlflow_gridsearch.py
Created April 22, 2020 15:24
Create MLFlow runs with Sklearn Gridsearch object
def log_run(gridsearch: sklearn.GridSearchCV, experiment_name: str, model_name: str, run_index: int, conda_env, tags={}):
"""Logging of cross validation results to mlflow tracking server
Args:
experiment_name (str): experiment name
model_name (str): Name of the model
run_index (int): Index of the run (in Gridsearch)
conda_env (str): A dictionary that describes the conda environment (MLFlow Format)
tags (dict): Dictionary of extra data and tags (usually features)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@carlleston
carlleston / 3-ln_model.ipynb
Last active August 23, 2020 12:09
pre-processing and linear model in pyspark
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.