Skip to content

Instantly share code, notes, and snippets.

@anix-lynch
Created May 23, 2025 17:10
Show Gist options
  • Save anix-lynch/f748454f5b0b0e272287200ad2d36dd7 to your computer and use it in GitHub Desktop.
Save anix-lynch/f748454f5b0b0e272287200ad2d36dd7 to your computer and use it in GitHub Desktop.
import pandas as pd
import pyarrow # required for Parquet I/O
# 1. Create a small DataFrame (like your CSV sample)
df = pd.DataFrame({
"name": ["Alice", "Bob"],
"department": ["Engineering", "Marketing"],
"salary": [80000, 72000]
})
# 2. Save as CSV (row-based)
df.to_csv("employees.csv", index=False)
# 3. Save as Parquet (column-based)
df.to_parquet("employees.parquet", index=False)
# 4. Load just the salary column from Parquet
salary_only = pd.read_parquet("employees.parquet", columns=["salary"])
print("💼 Full DataFrame from CSV:")
print(df)
print("\n🧃 Only 'salary' column from Parquet:")
print(salary_only)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment