Skip to content

Instantly share code, notes, and snippets.

View isteves's full-sized avatar

Irene Steves isteves

View GitHub Profile
@isteves
isteves / tidyverse2pyspark.md
Last active February 28, 2023 13:34
tidyverse2pyspark_translation

Tidyverse to pyspark translations

Adding count of a column as a new column

df %>% add_count(some_col)
df.withColumn("n", count("*").over(Window.partitionBy("some_col")))
@isteves
isteves / pyspark_tricks.md
Last active May 25, 2022 11:40
PySpark tricks

PySpark tricks

"Exploding" aggregations

If you want to do the same aggregation to many columns you can write it this way to be more succinct:

cols_min = ["size", "age"]

df \
@isteves
isteves / resources.md
Last active January 25, 2022 09:39
Resource collection
@isteves
isteves / neo4j.md
Last active October 31, 2021 17:42
neo4j learnings

Undirected: (a)-[r]-(b) Directed: (a)-[r]->(b) where a and b are nodes and r is the relationship (link) between them

In the following call, the curly brackets are for extra parameters (json form). CALL apoc.import.graphml("file://graph.graphml", {}) CALL apoc.import.graphml("file://graph.graphml", {readLabels: true})

There are properties and labels. Labels are what you can see as different colors in neo4j, and is defined in a graphml file as shown below (see ":Person"). Properties are other attributes that you can query by, such as age ("> 30 years old").

@isteves
isteves / glue_in_function.md
Created January 26, 2021 19:31
Using glue::glue() inside of another function

Using glue inside of another function

The key is defining an environment!

test_glue <- function(cmd, e = parent.frame()) {
  crayon::red(glue::glue(cmd, .envir = e))
}

test_fxn &lt;- function (name) {
@isteves
isteves / pkg_db_connection.md
Last active January 10, 2021 09:20
Managing a DB connection in an R package

In our department, there's almost always just a single database that we want to connect to. Thus, managing the connection throughout our code quickly becomes annoying and redundant:

conn <- odbc::dbConnect(odbc::odbc(), ...)

dbGetQuery(conn, statement1)
dbGetQuery(conn, statement2)
dbGetQuery(conn, statement3)
@isteves
isteves / pycharm_shortcuts.md
Last active January 5, 2021 09:06
pycharm shortcuts

Pycharm shortcuts

Useful shortcuts

  • control + shift + r -- runs the script ("source")

Matching RStudio

It's a pain to relearn shortcuts for every program, so I adjusted a few Pycharm shortcuts to match what I'm used to in RStudio.

@isteves
isteves / style_table_values.R
Created July 14, 2020 13:32
Flexible way to format table values using dplyr & formattable
library(tidyverse) #need dplyr 1.0.0+ https://www.tidyverse.org/blog/2020/03/dplyr-1-0-0-summarise/
library(formattable)
df <- tibble(spent = 1:10,
refund = 100:109,
perc = seq(0.1, 1, 0.1),
num = 1000:1009,
blah = 1000:1009)
style_table_values <- function(df,
@isteves
isteves / two_githubs.md
Last active July 19, 2020 20:50
Dealing with two GitHub accounts

Dealing with two GitHub accounts on one computer

I keep messing this up, so documenting what I currently remember here. If future Irene arrived here because of a bad commit, this is how you undo the latest commit: git reset HEAD~ (SO post)

To start, I set up my work and personal accounts with separate SSH keys (id_rsa files, etc) using this guide.

I'm never clear on what's going on behind the scenes in the RStudio git pane, so I tend to use the terminal when using my secondary (personal) account. Each time I open a new personal project, I need the following steps.

  1. When cloning the repo, add "personal" to the URL (this is how I have it set up): git@personal.github.com:isteves/my_repo.git
  2. After cloning and creating a new RProject, run ssh-add ~/.ssh/id_rsa_personal in the terminal to switch to the persona
@isteves
isteves / MKrene.tmTheme
Created May 1, 2020 11:53
Modified Monokai theme
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<!-- Generated by: TmTheme-Editor -->
<!-- ============================================ -->
<!-- app: http://tmtheme-editor.herokuapp.com -->
<!-- code: https://github.com/aziz/tmTheme-Editor -->
<plist version="1.0">
<dict>
<key>name</key>
<string>MKrene</string>