Skip to content

Instantly share code, notes, and snippets.

@DeflateAwning
Created February 8, 2021 03:05
Show Gist options
  • Save DeflateAwning/128b0d95dd15e3b4cce1d00f698d8869 to your computer and use it in GitHub Desktop.
Save DeflateAwning/128b0d95dd15e3b4cce1d00f698d8869 to your computer and use it in GitHub Desktop.
Have a table with a ton of columns? Know what you want your primary key to be? Use this method to find which columns have different values per row within each target primary key.
import pandas as pd
import BOE_SQLAccess as boesql
sqlEngine = boesql.createSQLAlchemyEngine() # or equivalent SQLEngine object
tableName = 'petrinex_ab_well_licence'
pk = 'Licence Number' # target column to check for duplicates as the primary key
df_in = pd.read_sql(tableName, con=sqlEngine)
df = df_in.copy() # make a copy for safety, because downloads can be expensive
df = df.groupby(pk, as_index=True).nunique()
df = df.transpose()
# give list of columns with multiple distinct values
df[df.gt(1).any(axis=1)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment