Skip to content

Instantly share code, notes, and snippets.

@Mlawrence95
Last active March 26, 2024 10:25
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save Mlawrence95/f697aa939592fa3ef465c05821e1deed to your computer and use it in GitHub Desktop.
Save Mlawrence95/f697aa939592fa3ef465c05821e1deed to your computer and use it in GitHub Desktop.
Python: create a confusion matrix across two columns in a Pandas dataframe having only categorical data
import pandas as pd
def confusion_matrix(df: pd.DataFrame, col1: str, col2: str):
"""
Given a dataframe with at least
two categorical columns, create a
confusion matrix of the count of the columns
cross-counts
use like:
>>> confusion_matrix(test_df, 'actual_label', 'predicted_label')
"""
return (
df
.groupby([col1, col2])
.size()
.unstack(fill_value=0)
)
@hmeine
Copy link

hmeine commented Jun 21, 2023

Very nice. I came up with

df.groupby(col1)[col2].value_counts()

and was looking for a way to turn the resulting multi-index Series into a DataFrame, but I like your solution much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment