Skip to content

Instantly share code, notes, and snippets.

@hwetsman
Last active January 18, 2022 23:22
Show Gist options
  • Save hwetsman/0a5ff1a54b9416c27d1ff61384912a5b to your computer and use it in GitHub Desktop.
Save hwetsman/0a5ff1a54b9416c27d1ff61384912a5b to your computer and use it in GitHub Desktop.

Pandas

Data Inspection

df.info() - gives column by col review of dtypes and number of non-null entries

df.sample(n) - returns a random sample of n entries from the df. Good for looking for quality problems. Default is n=1.

df.head(n) - returns the first n rows of the dataframe

df.tail(n) - returns the last n rows of the dataframe

df.describe(percentiles=[.25, .5, .75], include=None, exclude=None, datetime_is_numeric=False) - percentiles defaults to those shown here but you can apply any you wish. include defaults to only numeric; can be include='all' to show all cols, or can be a list of dtypes you wish to show. exclude works in a complementary fashion. datetime_is_numeric if True will treat datetime cols as numeric and include them in the otherwise default call.

df.col.value_counts(normalize=False,sort=True,ascending=False,bins=int,dropna = True) - returns the values in the column and the number of times they appear. normalize=True will return the relative frequencies rather than count. Descending sort is default. bins takes an integer and only works with numeric data. dropna=False if you want to see the effect of NaNs in your col.

df_new = df.copy() produces a copy of df under a different name. Essential before cleaning a df so that you don't lose the original.

df1 = pd.concat([df1, df2], ignore_index=True) stacks one df on another matching matching cols. You can rename or use one of the previous dfs.

df.col1.corr(df.col2) to get the correlation between two columns

df.isna().sum() will give a list of cols with a count of the nulls in that col

Ordering Categories in cols: df.a = pd.Categorical(df.a,categories=["Cat1","Cat2","Cat3","Cat4"],ordered=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment