Skip to content

Instantly share code, notes, and snippets.

View dheerajinampudi's full-sized avatar

Dheeraj Inampudi dheerajinampudi

View GitHub Profile
@dheerajinampudi
dheerajinampudi / .py
Created April 23, 2019 05:49
recursively creates the directory and does not raise an exception if the directory already exists
#Starting from Python 3.5, pathlib.Path.mkdir has an exist_ok flag:
from pathlib import Path
path = Path('/my/directory/filename.txt')
path.parent.mkdir(parents=True, exist_ok=True)
# path.parent ~ os.path.dirname(path)
#This recursively creates the directory and does not raise an exception if the directory already exists.
#(just as os.makedirs got an exist_ok flag starting from python 3.2 e.g os.makedirs(path, exist_ok=True))
@dheerajinampudi
dheerajinampudi / .sh
Created April 23, 2019 10:55
command to search inside s3 bucket
I tried in the following way
aws s3 ls s3://Bucket1/folder1/2019/ --recursive |grep filename.csv
This outputs the actual path where the file exists
2019-04-05 01:18:35 111111 folder1/2019/03/20/filename.csv
@dheerajinampudi
dheerajinampudi / pyspark_udf_filtering.py
Created May 8, 2019 09:46 — forked from samuelsmal/pyspark_udf_filtering.py
PySpark DataFrame filtering using a UDF and Regex
from pyspark.sql.functions import udf
from pyspark.sql.types import BooleanType
def regex_filter(x):
regexs = ['.*ALLYOURBASEBELONGTOUS.*']
if x and x.strip():
for r in regexs:
if re.match(r, x, re.IGNORECASE):
return True
@dheerajinampudi
dheerajinampudi / PySpark DataFrame from many small pandas DataFrames.ipynb Convert a RDD of pandas DataFrames to a single Spark DataFrame using Arrow and without collecting all data in the driver.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dheerajinampudi
dheerajinampudi / py
Last active May 18, 2019 11:42
interchange two column's order in pandas
#interchange two column's order in pandas
cols = df3.columns.tolist()
column_to_move = "Altitude"
new_position = 1
cols.insert(new_position, cols.pop(cols.index(column_to_move)))
df3 = df3[cols]
@dheerajinampudi
dheerajinampudi / pythonubuntu.txt
Last active January 4, 2021 00:59
Ubuntu 20.04 python to python3
sudo apt install python-is-python3
this replaces the symlink in /usr/bin/python to /usr/bin/python3
@dheerajinampudi
dheerajinampudi / cloudwatchpowertools_example.json
Last active February 21, 2021 05:48
CloudWatch powertools logging output example
{
"timestamp": "2021-02-12 18:17:33,774",
"level": "INFO",
"location": "collect.handler:1",
"service": "payment",
"lambda_function_name": "test",
"lambda_function_memory_size": 128,
"lambda_function_arn": "arn:aws:lambda:eu-west-1:12345678910:function:test",
"lambda_request_id": "52fdfc07-2182-154f-163f-5f0f9a621d72",
"cold_start": true,
@dheerajinampudi
dheerajinampudi / zipfile_extraction.py
Created March 30, 2021 17:04
python script to extract a zip file
import zipfile
path_to_zip_file = 'archive.zip'
directory_to_extract_to = 'archive_unzipped_py/'
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
@dheerajinampudi
dheerajinampudi / os_walk_paths.py
Created March 30, 2021 17:07
file names in a directory tree
import os
for root, dirs, files in os.walk(".", topdown=False):
for name in files:
print(os.path.join(root, name))
#print(name)
for name in dirs:
print(os.path.join(root, name))
#print(name)
@dheerajinampudi
dheerajinampudi / s3-endpoint-diff.csv
Last active May 1, 2021 02:22
difference between s3 website endpoint and REST API endpoint
Website Endpoint Rest API
Bucket is publicly available Only Accessible via cloudfront endpoint
Must be a public bucket Need not be a public bucket
Less secured due to s3 global read access More secured because of OAI configuration
Users can access your files through CloudFront and S3 bucket directly users can only access your files through CloudFront not directly from the S3 bucket
Makes Auditing difficult as buckets need public access option to be ON at all times Meet compliance by disabling public access to all buckets by default