Skip to content

Instantly share code, notes, and snippets.

View dheerajinampudi's full-sized avatar

Dheeraj Inampudi dheerajinampudi

View GitHub Profile
@dheerajinampudi
dheerajinampudi / pyspark_udf_filtering.py
Created May 8, 2019 09:46 — forked from samuelsmal/pyspark_udf_filtering.py
PySpark DataFrame filtering using a UDF and Regex
from pyspark.sql.functions import udf
from pyspark.sql.types import BooleanType
def regex_filter(x):
regexs = ['.*ALLYOURBASEBELONGTOUS.*']
if x and x.strip():
for r in regexs:
if re.match(r, x, re.IGNORECASE):
return True
@dheerajinampudi
dheerajinampudi / PySpark DataFrame from many small pandas DataFrames.ipynb Convert a RDD of pandas DataFrames to a single Spark DataFrame using Arrow and without collecting all data in the driver.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dheerajinampudi
dheerajinampudi / clean_code.md
Created May 5, 2023 03:01 — forked from wojteklu/clean_code.md
Summary of 'Clean code' by Robert C. Martin

Code is clean if it can be understood easily – by everyone on the team. Clean code can be read and enhanced by a developer other than its original author. With understandability comes readability, changeability, extensibility and maintainability.


General rules

  1. Follow standard conventions.
  2. Keep it simple stupid. Simpler is always better. Reduce complexity as much as possible.
  3. Boy scout rule. Leave the campground cleaner than you found it.
  4. Always find root cause. Always look for the root cause of a problem.

Design rules