Skip to content

Instantly share code, notes, and snippets.

@yowainwright
Last active April 6, 2023 05:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yowainwright/7d4f97c6045d03ef13140de721daa873 to your computer and use it in GitHub Desktop.
Save yowainwright/7d4f97c6045d03ef13140de721daa873 to your computer and use it in GitHub Desktop.
Pyspark vs Polars Utils

Polars vs Pyspark Utils

The following files are util functions for easier Polars and Pyspark conversion development.

import boto3
import polars from pl
client = boto3.client('s3')
def from_csv(bucket_name, input_path):
data = client.get_object(Bucket=bucket_name, Key=input_path)
csv_bytes = data['Body'].read()
return pl.read_csv(csv_bytes)
import polars from pl
# Assumes pandas and pyspark are installed
def to_polars(spark_df):
pandas_df = spark_df.select("*").toPandas()
data = pandas_df.to_dict('list')
return pl.DataFrame(data)
import boto3
import polars from pl
client = boto3.client('s3')
def to_csv(df, bucket_name, output_path):
client.put_object(Body=df.write_csv(), Bucket=bucket_name, Key=output_path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment