Mohamed Abido MHAbido

## df_resources.md

      
              1 file
            
          
              25 forks
            
          
              0 comments
            
          
              97 stars
            
          
                johnhw
                / df_resources.md
            
            
              Last active
              May 4, 2024 02:54
            
          
    A complete list of books, articles, blog posts, videos and neat pages that support Data Fundamentals (H), organised by Unit.
Formatting

If the resource is available online (legally) I have included a link to it. Each entry has symbols following it.

⨕⨕⨕ indicates difficulty/depth, from ⨕ (easy to pick up intro, no background required) through ⨕⨕⨕⨕⨕ (graduate level textbook, maths heavy, expect equations)
⭐ indicates a particularly recommended resource; 🌟 is a very strongly recommended resource and you should look at it.


## pandas_s3_streaming.py
def s3_to_pandas(client, bucket, key, header=None):

    # get key using boto3 client
    obj = client.get_object(Bucket=bucket, Key=key)
    gz = gzip.GzipFile(fileobj=obj['Body'])

    # load stream directly to DF
    return pd.read_csv(gz, header=header, dtype=str)

def s3_to_pandas_with_processing(client, bucket, key, header=None):

## condaenv.txt
# For Windows users# Note: <> denotes changes to be made

#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>

#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment

conda list -e > requirements.txt #Save all the info about packages to your folder

## System Design.md

      
              1 file
            
          
              2612 forks
            
          
              58 comments
            
          
              9404 stars
            
          
                vasanthk
                / System Design.md
            
            
              Last active
              May 6, 2024 20:21
            
              
                System Design Cheatsheet
              
          
    System Design Cheatsheet


Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps


Clarify and agree on the scope of the system


User cases (description of sequences of events that, taken together, lead to a system doing something useful)

Who is going to use it?
How are they going to use it?


## lambda.py
import boto3
import datetime

def lambda_handler(event, context):
    print("Connecting to RDS")
    client = boto3.client('rds')

    print("RDS snapshot backups stated at %s...\n" % datetime.datetime.now())
    client.create_db_snapshot(
        DBInstanceIdentifier='web-platform-slave',

## sql.py
"""
Patched version to support PostgreSQL
(original version: https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py)

Adapted functions are:
- added _write_postgresql
- updated table_exist
- updated get_sqltype
- updated get_schema

## postgres_queries_and_commands.sql
-- show running queries (pre 9.2)
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;

-- show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'

## load_and_save.py
"""http://stackoverflow.com/questions/6282432/load-sparse-array-from-npy-file
"""
import random
import scipy.sparse as sparse
import scipy.io
import numpy as np

def save_sparse_matrix(filename, x):
    x_coo = x.tocoo()
    row = x_coo.row
	def s3_to_pandas(client, bucket, key, header=None):

	# get key using boto3 client
	obj = client.get_object(Bucket=bucket, Key=key)
	gz = gzip.GzipFile(fileobj=obj['Body'])

	# load stream directly to DF
	return pd.read_csv(gz, header=header, dtype=str)

	def s3_to_pandas_with_processing(client, bucket, key, header=None):
	# For Windows users# Note: <> denotes changes to be made

	#Create a conda environment
	conda create --name <environment-name> python=<version:2.7/3.5>

	#To create a requirements.txt file:
	conda list #Gives you list of packages used for the environment

	conda list -e > requirements.txt #Save all the info about packages to your folder
	import boto3
	import datetime

	def lambda_handler(event, context):
	print("Connecting to RDS")
	client = boto3.client('rds')

	print("RDS snapshot backups stated at %s...\n" % datetime.datetime.now())
	client.create_db_snapshot(
	DBInstanceIdentifier='web-platform-slave',
	"""
	Patched version to support PostgreSQL
	(original version: https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py)

	Adapted functions are:
	- added _write_postgresql
	- updated table_exist
	- updated get_sqltype
	- updated get_schema
	-- show running queries (pre 9.2)
	SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
	FROM pg_stat_activity
	WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
	ORDER BY query_start desc;

	-- show running queries (9.2)
	SELECT pid, age(clock_timestamp(), query_start), usename, query
	FROM pg_stat_activity
	WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
	"""http://stackoverflow.com/questions/6282432/load-sparse-array-from-npy-file
	"""
	import random
	import scipy.sparse as sparse
	import scipy.io
	import numpy as np

	def save_sparse_matrix(filename, x):
	x_coo = x.tocoo()
	row = x_coo.row