Skip to content

Instantly share code, notes, and snippets.

@sainathadapa
Last active December 19, 2023 15:33
Show Gist options
  • Save sainathadapa/08c1028c92684fe1ec89ecb5d5629a57 to your computer and use it in GitHub Desktop.
Save sainathadapa/08c1028c92684fe1ec89ecb5d5629a57 to your computer and use it in GitHub Desktop.
glimpse-python
def glimpse(df, maxvals=10, maxlen=110):
print('Shape: ', df.shape)
def pad(y):
max_len = max([len(x) for x in y])
return [x.ljust(max_len) for x in y]
# Column Name
toprnt = pad(df.columns.tolist())
# Column Type
toprnt = pad([toprnt[i] + ' ' + str(df.iloc[:,i].dtype) for i in range(df.shape[1])])
# Num NAs
num_nas = [df.iloc[:,i].isnull().sum() for i in range(df.shape[1])]
num_nas_ratio = [int(round(x*100/df.shape[0])) for x in num_nas]
num_nas_str = [str(x) + ' (' + str(y) + '%)' for x,y in zip(num_nas, num_nas_ratio)]
max_len = max([len(x) for x in num_nas_str])
num_nas_str = [x.rjust(max_len) for x in num_nas_str]
toprnt = [x + ' ' + y + ' NAs' for x,y in zip(toprnt, num_nas_str)]
# Separator
toprnt = [x + ' : ' for x in toprnt]
# Values
toprnt = [toprnt[i] + ', '.join([str(y) for y in df.iloc[:min([maxvals,df.shape[0]]), i]]) for i in range(df.shape[1])]
# Trim to maxlen
toprnt = [x[:min(maxlen, len(x))] for x in toprnt]
for x in toprnt:
print(x)
@sainathadapa
Copy link
Author

This code tries to replicate the functionality of the 'glimpse' function from R package 'dplyr'

@voigtjessica
Copy link

Beautiful! I was looking for something like that. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment