Skip to content

Instantly share code, notes, and snippets.

A complete list of books, articles, blog posts, videos and neat pages that support Data Fundamentals (H), organised by Unit.

Formatting

If the resource is available online (legally) I have included a link to it. Each entry has symbols following it.

  • ⨕⨕⨕ indicates difficulty/depth, from ⨕ (easy to pick up intro, no background required) through ⨕⨕⨕⨕⨕ (graduate level textbook, maths heavy, expect equations)
  • ⭐ indicates a particularly recommended resource; 🌟 is a very strongly recommended resource and you should look at it.
@uhho
uhho / pandas_s3_streaming.py
Last active December 2, 2022 18:57
Streaming pandas DataFrame to/from S3 with on-the-fly processing and GZIP compression
def s3_to_pandas(client, bucket, key, header=None):
# get key using boto3 client
obj = client.get_object(Bucket=bucket, Key=key)
gz = gzip.GzipFile(fileobj=obj['Body'])
# load stream directly to DF
return pd.read_csv(gz, header=header, dtype=str)
def s3_to_pandas_with_processing(client, bucket, key, header=None):
@pratos
pratos / condaenv.txt
Created November 30, 2016 07:01
To package a conda environment (Requirement.txt and virtual environment)
# For Windows users# Note: <> denotes changes to be made
#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>
#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment
conda list -e > requirements.txt #Save all the info about packages to your folder
@vasanthk
vasanthk / System Design.md
Last active May 6, 2024 20:21
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?
@mzupan
mzupan / lambda.py
Last active June 8, 2021 05:19
AWS Lambda job to backup RDS instances
import boto3
import datetime
def lambda_handler(event, context):
print("Connecting to RDS")
client = boto3.client('rds')
print("RDS snapshot backups stated at %s...\n" % datetime.datetime.now())
client.create_db_snapshot(
DBInstanceIdentifier='web-platform-slave',
@jorisvandenbossche
jorisvandenbossche / sql.py
Last active April 9, 2022 00:55
Patched version of pandas.io.sql to support PostgreSQL
"""
Patched version to support PostgreSQL
(original version: https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py)
Adapted functions are:
- added _write_postgresql
- updated table_exist
- updated get_sqltype
- updated get_schema
@rgreenjr
rgreenjr / postgres_queries_and_commands.sql
Last active May 7, 2024 15:24
Useful PostgreSQL Queries and Commands
-- show running queries (pre 9.2)
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-- show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
@fannix
fannix / load_and_save.py
Created February 18, 2012 08:34
Save and load sparse matrix
"""http://stackoverflow.com/questions/6282432/load-sparse-array-from-npy-file
"""
import random
import scipy.sparse as sparse
import scipy.io
import numpy as np
def save_sparse_matrix(filename, x):
x_coo = x.tocoo()
row = x_coo.row