Skip to content

Instantly share code, notes, and snippets.

@elijahbenizzy
Last active August 23, 2023 03:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elijahbenizzy/7d2a00db0b0b691cfa82513470fd86f8 to your computer and use it in GitHub Desktop.
Save elijahbenizzy/7d2a00db0b0b691cfa82513470fd86f8 to your computer and use it in GitHub Desktop.
import pyspark.sql as ps
from hamilton.function_modifiers import load_from, value, source
@load_from.csv(path=value("data_1.csv"), inject_="raw_data_1", spark=source("spark"))
@load_from.parquet(path=value("data_2.parquet"), inject_="raw_data_2", spark=source("spark"))
def all_initial_data(raw_data_1: ps.DataFrame, raw_data_2: ps.DataFrame) -> ps.DataFrame:
"""Combines the two loaded dataframes"""
return _custom_join(raw_data_1, raw_data_2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment