Skip to content

Instantly share code, notes, and snippets.

View Smarker's full-sized avatar
👋

Stephanie Marker Smarker

👋
View GitHub Profile

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-user-delegation-sas-create-cli

Data Lake

  • designed for fault-tolerance, infinite-scalability, high-throughput ingestion of variable-sized data
  • used for data exploration, analytics, ML
  • could act as a data source for a data warehouse
    • raw data ingested into data lake -> transform (with ELT pipeline - data is ingested and transformed in-place) into structured, queryable format
    • source data that is already relational can go directly into the data warehouse, skipping the data lake
  • often used in event streaming or iot, because they can persist large amounts of relational and non-relational data without transformations or schema definitions
#!/bin/bash
set -euo pipefail
IPYNB_FILE_PATHS=`git diff --staged --name-only | awk "/.ipynb/"`
PROJECT_ROOT_PATH=`git rev-parse --show-toplevel`
function get_script_extension() {
file_path=$1
echo $(cat $file_path | jq -r '.metadata.file_extension')
@Smarker
Smarker / powerlines.md
Last active January 14, 2019 19:36
#kaggle

Discovering the Fourier Transform: A Tutorial on Circulant Matrices, Circular Convolution, and the DFT

Summary

  • Discrete Fourier Transform (DFT) arises naturally out of analysis of circulant matrices
  • DFT can be derived as the change of basis that simultaneously diagonalizes all circulant matrices

Stable Tensor Neural Networks for Rapid Deep Learning

Paper source

Summary

  • t-NN framework - a NN framework with multidimensional tensor data based on the t-product (multiply 2 tensors with circulant convolution)

Pros of using t-NNs

  • quicker learning because of reduced parameter space
  • improved generalizability of stable t-NNs
@Smarker
Smarker / pandas.md
Last active August 21, 2018 19:55
#user:smarker #pandas

Pandas

  1. Set date column as a DatetimeIndex
  2. Filter by df[start_date : end_date]

Example: Filter taxi data by a 24 hour time window

  1. Set date column tpep_pickup_datetime as DatetimeIndex
@Smarker
Smarker / fluent-python.md
Last active August 13, 2018 22:18
#user:smarker #python

Fluent Python

Dunder Methods

representation methods
string/bytes representation __repr__, __str__, __format__, __bytes__
conversion to number __abs__, __bool__, __complex__, __int__, __float__, __hash__, __index__
emulating collections __len__, __getitem__, __setitem__, __delitem__, __contains__
iteration __iter__, __reversed__, __next__
@Smarker
Smarker / object-detection.md
Last active August 10, 2018 14:15
user:smarker #ml #object-detection

Object Detection

total # positive classes <<< total # negative classes

example: identifying fradulent claims

There may not be many fradulent claims, so the classifier will tend to classify fraudulent claims as genuine.

@Smarker
Smarker / norway.md
Last active July 12, 2020 15:00
#vacation

Norway

June 11-12

Boston BOS -> Paris CDG 11:30 PM -> 12:10 PM

June 12

18:25 Bergen -> 19:05 Stavanger

@Smarker
Smarker / AKS-dask-setup.md
Last active January 8, 2018 18:25
set up dask with aks using wsl

List all azure subscriptions

az account list

Switch to a subscription

az account set -s "subscription name"

Show current subscription you are in