Skip to content

Instantly share code, notes, and snippets.

View dsalaj's full-sized avatar
🐙

Darjan Salaj dsalaj

🐙
View GitHub Profile
@dsalaj
dsalaj / pyspark_cheatsheet.py
Created June 26, 2020 08:48
Cheatsheet for pyspark
# filter with strings
df.filter(df.name.endswith('ice')).collect()
# [Row(age=2, name='Alice')]
# order with null values at the end
df.select(df.name).orderBy(df.name.desc_nulls_last()).collect()
# [Row(name='Tom'), Row(name='Alice'), Row(name=None)]
# filter by null
df.filter(df.height.isNotNull()).collect()
@dsalaj
dsalaj / tf_dataset_split_util.py
Created March 23, 2020 18:05
Different ways of splitting tensorflow dataset
def split_dataset(ds, version=1):
if version == 1:
train_ds = ds.dataset.shard(num_shards=4, index=0)
train_ds.concatenate(ds.dataset.shard(num_shards=4, index=1))
train_ds.concatenate(ds.dataset.shard(num_shards=4, index=2))
valid_ds = ds.dataset.shard(num_shards=4, index=3)
return train_ds, valid_ds
elif version == 2:
def is_val(x, y):
@dsalaj
dsalaj / slurmjob.sh
Created February 21, 2020 09:15
Example of slurm job script. Start with: sbatch slurmjob.sh
#!/bin/bash
#SBATCH --job-name=GSC # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=salaj.au@gmail.com # Where to send mail
#SBATCH --output=slurm_out_%j.log # Standard output and error log
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --partition=IGIcrunchers
conda activate venv2
@dsalaj
dsalaj / tf_ds_from_parametrized_generator.py
Created February 7, 2020 09:47
Example of tf.data.Dataset.from_generator usage with parametrized generator
import tensorflow as tf
x_train = [i for i in range(0, 20, 2)] # even
x_val = [i for i in range(1, 20, 2)] # odd
y_train = [i**2 for i in x_train] # squared
y_val = [i**2 for i in x_val]
def gen_data_epoch(test=False): # parametrized generator
train_data = x_val if test else x_train
label_data = y_val if test else y_train
@dsalaj
dsalaj / keybase.md
Created November 30, 2019 21:39
keybase.md

Keybase proof

I hereby claim:

  • I am dsalaj on github.
  • I am dsalaj (https://keybase.io/dsalaj) on keybase.
  • I have a public key ASBBDtsHlfUOlqCUF48dL0qkNY-lWwrLC2dbOWrHjNMYrwo

To claim this, I am signing this object:

@dsalaj
dsalaj / np_tolerant_mean.py
Created November 13, 2019 08:06
Calculate mean of list of arrays with different lengths (useful for plotting progress of incomplete simulation runs)
import numpy as np
x = [1, 2, 3.5, 4]
y = [1, 2, 3, 3, 4, 5, 3]
z = [7, 8]
arrs = [x, y, z]
def tolerant_mean(arrs):
# arrs = [x, y, z]
lens = [len(i) for i in arrs]
@dsalaj
dsalaj / jupyter_setup.sh
Created September 16, 2019 08:39
Setup python jupyter notebooks for editing over SSH
# Steps for setting up python jupyter notebook for editing over SSH
# this is not a runnable script as different commands need to be executed on different machines
# ON REMOTE MACHINE
ssh username@remotepc123
# make sure the jupyter is installed
pip install jupyter
# start jupyter on specified port and no-browser mode
jupyter notebook --no-browser --port=8080
# copy the url with token that looks something like this:
@dsalaj
dsalaj / cte_analog_to_spikes.py
Last active December 9, 2019 08:26
Crossing Threshold Encoding of pixel values to spikes
def find_onset_offset(y, threshold):
"""
Given the input signal `y` with samples,
find the indices where `y` increases and descreases through the value `threshold`.
Return stacked binary arrays of shape `y` indicating onset and offset threshold crossings.
`y` must be 1-D numpy arrays.
"""
if threshold == 1:
equal = y == threshold
transition_touch = np.where(equal)[0]
# # First create and activate conda python3 environment:
# conda create -n video python=3.6
# conda activate video
# # Then install the requirements:
# conda install ffmpeg
# conda install tensorflow-gpu==1.13.1
# pip install tensorflow_datasets
# # The bellow code would still produce an error because of the missing file ("ucf101_labels.txt")
# # So manually download the "ucf101_labels.txt" and put it in place:
# cd "/home/$USER/anaconda3/envs/video/lib/python3.6/site-packages/tensorflow_datasets/video/"
@dsalaj
dsalaj / video_to_dataset.sh
Last active February 1, 2019 09:33
shell commands used to extract and downsample video to numpy arrray
# install Anaconda to control the environment: https://www.anaconda.com/distribution/#linux
wget https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh
chmod +x Anaconda3-2018.12-Linux-x86_64.sh
./Anaconda3-2018.12-Linux-x86_64.sh
# answer to the installation prompts
# activate environment and install the required libraries
conda create -n vid2frame
conda activate vid2frame
conda install opencv scipy