Skip to content

Instantly share code, notes, and snippets.

View erelinz's full-sized avatar

erelinz

  • Boston, MA
  • 14:41 (UTC -04:00)
View GitHub Profile
@erelinz
erelinz / inspect_parquet.py
Created August 7, 2023 20:28
Read metadata and schema from parquet file and show the compression ratio and type for each column.
#!/usr/bin/python3
import sys
import pyarrow.parquet as pq
import os
def inspect_parquet(file_name):
"""Reads metadata from parquet file and prints details of each column."""
# Read metadata from parquet file
metadata = pq.read_metadata(file_name)
@erelinz
erelinz / parquet2json
Created June 29, 2023 02:52
parquet2json is a Python script that converts a Parquet file to JSON format
#!/usr/bin/python3
import sys; import pandas as pd; pd.read_parquet(sys.argv[1]).to_json(sys.stdout, orient='records')
@erelinz
erelinz / delete_bucket.py
Created April 11, 2023 23:20
delete_bucket
"""Delete all objects in a bucket and then delete the bucket."""
# Description: delete all objects in a bucket and then delete the bucket"""
# Usage: python bucket_delete.py --bucket_name <bucket_name> --profile_name <profile_name>
# Author: @ereli
# Date: 2023-04-11
import argparse
import boto3