Skip to content

Instantly share code, notes, and snippets.

@sohang3112
Last active June 14, 2024 05:29
Show Gist options
  • Save sohang3112/f69810e5e130d435b107bc955e20e739 to your computer and use it in GitHub Desktop.
Save sohang3112/f69810e5e130d435b107bc955e20e739 to your computer and use it in GitHub Desktop.
Notes on Pandas & Numpy methods

Pandas & Numpy - Tips & Tricks!

Imports & Conventions:

import numpy as np
import pandas as pd

# df is a Pandas DataFrame
# series is a Pandas Series
  • Keep only numerical columns using df2 = df.select_dtypes(exclude=['object']).
  • df.explode(['column_to_explode']) transforms data of the form
Col1 Col2 column_to_explode
A BBc ['l1', 'l2']
Zxv dfafa ['l3']

into:

Col1 Col2 column_to_explode
A BBc l1
A BBc l2
Zxv dfafa l3

Note: In this example, Column column_to_explode initially had list in each cell.

  • Difference b/w 2 dataframes: df1.compare(df2) (this will show the rows in df1 where any column value is different from df2)

  • pd.get_dummies(df) converts categorical data (i.e., columns having values from fixed choices - eg. male & female) into multiple dummy/indicator columns, one for each value in column, each column having values True | False. Columns that are already numerical are left unchanged. This is called One-Hot Encoding. For example, this data:

Pclass Sex
1 male
2 female
3 female
1 male
2 male

is transformed into:

Pclass Sex_male Sex_female
1 True False
2 False True
3 False True
1 True False
2 True False

Notice that Pclass numerical column is unchanged, while 2 new columns are created from categorical column Sex (one for each unique value in the column).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment