Skip to content

Instantly share code, notes, and snippets.

View tonythor's full-sized avatar

Tony Fraser tonythor

View GitHub Profile
@tonythor
tonythor / conditionalJoin.scala
Created September 16, 2024 15:54
a conditional join using a spark transformer
// first time used a tranformer... It's pretty cool so i'm adding this to my notes
def addColumnsAsZero(df: DataFrame, columns: Seq[String]): DataFrame = {
columns.foldLeft(df)((tempDF, colName) => tempDF.withColumn(colName, lit(0)))
}
def conditionalJoin(baseDf:DataFrame, leftDf: DataFrame,
doJoin:Boolean, joinCondition: Column,
joinType:String = "inner", zeroColumns:Seq[String] ) = {
if(doJoin) {
@tonythor
tonythor / file.py
Created September 12, 2024 00:49
608 data vis -> discussion 3 make better charts
## the bad chart
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
titanic = pd.read_csv('https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv')
titanic['Survival Status'] = titanic['Survived'].map({0: 'died', 1: 'survived'})
sns.set(style="dark", rc={'axes.facecolor':'#f2f2f2', 'grid.color': 'black'})
g = sns.catplot(x="Sex", hue="Survival Status", col="Pclass",
@tonythor
tonythor / load_kaggle_data.r
Last active November 8, 2023 05:26
Load data from kaggle using R
library(jsonlite)
library(readr)
library(tidyverse)
load_data_kaggle <- function(local_filename, kaggle_dataset) {
if (!file.exists(local_filename)) {
# Load Kaggle API credentials only if the file needs to be downloaded
# To get credentials file, log into kaggle, click on settings, api, you'll see it.
# save as nogit_kaggle.json
@tonythor
tonythor / lookslikethis.png
Last active November 3, 2023 01:01
Quarto / LaTex Color Header
lookslikethis.png
@tonythor
tonythor / SparkLoadDFIntegationTest.scala
Last active September 13, 2023 17:09
spark/scala : load a spark dataframe into an integration test from a csv file in the project resources directory
// PS, see this page, and yeah that is my comment!
// https://stackoverflow.com/questions/27360977/how-to-read-files-from-resources-folder-in-scala/55084068#55084068
// put a csv file in [project]/src/it/resources/data/mytestdata.csv
def loadDf(_p: String): DataFrame = {
val d = spark.sparkContext.parallelize(scala.io.Source.fromResource(s"data/${_p}").getLines().toSeq).toDS
spark.read.option("header", "true").option("delimiter", ",").csv(d)
}
val myDf = loadDf(_p = "mytestdata.csv"))
@tonythor
tonythor / rgdal_sf_mapview.on.osx.MD
Last active October 14, 2025 11:37
install rgdal on osx

install rgdal on mac/osx

This problem is all over the place with a whole not of nobody having got it working. Sorry I just got out of that hell myself and it was brutal. If you find this, you're looking at somebody who actually got it working on a brand new mac, with that running in developer mode. SO yes, it's really possible.

errors for the algorighms

  1. checking PROJ: checking whether PROJ and sqlite3 are available for linking:... no
  2. configure: error: libproj or sqlite3 not found in standard or given locations.
  3. ERROR: configuration failed for package ‘sf
  4. configure: error: gdal-config not found or not executable.
@tonythor
tonythor / dyplyr_and_sqldf_cheatsheet.R
Last active September 3, 2023 13:54
A scala programmer's guide to basic R dyplyr and sqldf.
# This is all cut and paste from my chat-gpt window, so
# be sure to double chedk everyhting!
## SQLDF ##########################################
library(sqldf)
# Basic SQL Operations
sqldf("SELECT column1, column2 FROM df") # SELECT
sqldf("SELECT * FROM df WHERE column1 = 'value'") # WHERE
sqldf("SELECT * FROM df ORDER BY column1 ASC") # ORDER BY
sqldf("SELECT column1, AVG(column2) FROM df GROUP BY column1 HAVING AVG(column2) > value") # GROUP BY & HAVING # nolint
@tonythor
tonythor / ComplexDfDebugger.scala
Created August 18, 2023 20:23
ComplexDFDebugger - A scala util that flattens and saves dataframes so they can be viewed in excel or numbers
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.types.StringType
object ComplexDFToCsv {
/**
* A utility class that stringifys all columns within a dataframe,
* and then inserts a couple of characters in every null/empty record
* to make viewing in excel a little easier.
* Example in: com.nbcuas.add.demandpreprocess.AccessS3
@tonythor
tonythor / getType.scala
Created August 4, 2023 20:56
the python type() method for scala
//Python programmers, yes it sucks. there is no type() method
//in scala. There's this though, I use it all the time.
import scala.reflect.runtime.universe._
def getType[T: TypeTag](a: T): Type = typeOf[T]
@tonythor
tonythor / PrettyPrint.scala
Created June 16, 2023 16:53
A scala version of python pretty print.
package com.nbcuas.add.common.general
import java.lang.reflect.Field
import java.sql.Timestamp
/**
## another shameless reprint of this stack post:
## https://stackoverflow.com/questions/15718506/scala-how-to-print-case-classes-like-pretty-printed-tree
## thanks to stack users [@F. P Freely] and [@samthebest]