Skip to content

Instantly share code, notes, and snippets.

View martinctc's full-sized avatar
🔥
Packaging up my workflows

Martin Chan martinctc

🔥
Packaging up my workflows
View GitHub Profile
@martinctc
martinctc / Data2Worksheets.vbs
Last active March 5, 2024 15:44
Split data into multiple worksheet based on column variables - edited from online sources
Sub parse_data()
Dim lr As Long
Dim ws As Worksheet
Dim vcol, i As Integer
Dim icol As Long
Dim myarr As Variant
Dim title As String
Dim titlerow As Integer
'This macro splits data into multiple worksheets based on the variables on a column found in Excel.
@martinctc
martinctc / approx_num.R
Created March 5, 2024 14:31
[Convert numeric value to natural language approximation] #R
#' @title Convert a numeric value into a natural language approximation string
#'
#' @description
#' This function takes a numeric value and returns a string that approximates the value in natural language.
#'
#' @param x A numeric value.
#'
#' @examples
#' approx_num(0.5)
#' # [1] "increased by a half"
@martinctc
martinctc / test-python-rf-runtime.py
Last active January 15, 2024 14:20
Test run speeds for RF model in Python including simulation
# data cleaning and utility
import numpy as np
import pandas as pd
import vivainsights as vi
import os
# timing code
import time
import random
import sys
@martinctc
martinctc / get-pypi-stats.py
Created November 8, 2023 15:48
[Get PyPI statistics] #python
import requests
import pandas as pd
package_name = "vivainsights"
api_endpoint = f"https://pypistats.org/api/packages/{package_name}/overall"
response = requests.get(api_endpoint)
if response.status_code == 200:
data = response.json()
@martinctc
martinctc / ForceNetwork_example.R
Created April 22, 2020 09:53
[ForceNetwork network example with {networkD3}] #R
library(tidyverse)
library(networkD3)
## Nodes data frame describing all the nodes in the network
## The first entry in nodes dataframe is node 0, the next entry is node 1 and so on.
## The nodes dataframe must be sorted according to this sequence.
## This is the only way to tie the nodes dataframe to the links dataframe.
TestNodes <- data.frame(name = c("Alpha",
"Beta",
"Cat",
@martinctc
martinctc / power-analysis.R
Created January 9, 2023 15:18
[Power analysis and sample size estimation with R] #R
# See <https://rpubs.com/mbounthavong/sample_size_power_analysis_R>
library(pwr)
# Sample size estimations for two proportions
# `pwr::ES.h()` computes effect size for two proportions
# n provides required sample size
p0 <- pwr.2p.test(h = ES.h(p1 = 0.60, p2 = 0.50), sig.level = 0.05, power = .80)
plot(p0)
@martinctc
martinctc / power-analysis.py
Last active January 6, 2023 15:42
[Power analysis with python] #python
# estimate sample size via power analysis
from statsmodels.stats.power import TTestIndPower
# parameters for power analysis
effect = 0.8
alpha = 0.05
power = 0.8
# perform power analysis
analysis = TTestIndPower()
@martinctc
martinctc / str_arrange.R
Created August 8, 2019 16:56
[Sort letters in a character string by alphabetical order] #R
#' Sorts letters in a character string by alphabetical order
#'
#' Vectorised
str_arrange <- function(x){
x %>%
stringr::str_split("") %>% # Split string into letters
purrr::map(~sort(.) %>% paste(collapse = "")) %>% # Sort and re-combine
as_vector() # Convert list into vector
}
@martinctc
martinctc / rank_by_group.R
Last active November 1, 2021 23:57
[Rank a data frame with a grouping variable using entirely base R] #R
#' @title
#' Rank a data frame by grouping variable using base R
#'
#' @description
#' This function ranks a specified column in a data frame by group using entirely base R functions.
#' The underlying function is `rank()`, where additional arguments can be passed with `...`.
#' The grouping variable is specified as a string using the argument `group_var`, and the variable to rank is
#' specified using the argument `rank_var`. The operation is analogous to using `group_by()` followed by
#' `mutate()` in {dplyr}.
#' See example below using the base dataset `iris`.
@martinctc
martinctc / repeat rows based on n
Created July 13, 2021 10:07
[Duplicate rows in data frame based on n] #R
# multiply values based on weights
wtest <-
data.frame(
x = c("cats", "dogs", "birds", "cats"),
y = c(1, 2, 3, 2)
)
wtest[rep(seq_len(nrow(wtest)), wtest$y),]