Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Useful commands for the pandas dataframe library for python.

Useful commands for Pandas dataframes

import pandas as pd

Loading data

  • from .csv
    df = pd.read_csv('file.csv', header=1)
  • from dictionary
    df = pd.DataFrame(dict)
  • from lists
    df = pd.DataFrame([[y, x1_1, x2_1, ...], [y, x1_2, x2_2, ...], ... ])
    df.columns = ['class', 'x1', 'x2', ...]
  • add a column
    df['x_new'] = x_new_list

View options

  • print non-truncated cell contents
    pd.set_option('display.max_colwidth', -1)

Dataframe overviews

  • data types of each column
  • statistical description of each column
  • see amount of missing data
  • Table of frequency counts for items in a column

Selecting data

  • select a subset of dataframe
    df_subset = df[(df.x1 > 50)|(df.x2 > 50)&(df.x3 == 100)] (| = or, & = and)
    df_subset = df[['x1', 'x2', ... ]]
  • manually select by indices
    x1_list = df.iloc[0:100, 1].values

Manipulating dataframe

  • add a column
    df['new_col'] = my_list
    df['new_col] = df[['x1']].apply(my_function)
    df['new_col'] = df['x1'].map(my_dict)
  • drop a column
    df = df.drop(['my_col'], axis=1)

Data processing

  • convert column to datetime
    df['date'] = pd.to_datetime(df['date'])
  • get dummy indicies for features (converts only string columns)
    pd.get_dummies(df[['feature_1', 'feature_2', ...]])
  • map dictionary to dataframe column
    my_map = {'a': 1, 'b': 2, ...}
    df['x_new'] = df['x'].map(my_map)
  • drop missing data
    df.dropna() (drop rows i.e., samples)
    df.dropna(axis=1) (drop columns i.e., features)
    df.dropna(thresh=4) (drop if >= thresh)
    df.dropna(subset=['x2']) (only drop for specified column)
  • drop a row/column
    df.drop(index_name) / df.drop(column_name, axis=1)

This comment has been minimized.

Copy link

@disrae disrae commented Sep 20, 2017

Hi, Thanks for the article.

I'm trying to use df['Column'].value_counts() but am receiving an error: TypeError: unhashable type: 'numpy.ndarray'. When I check the type, the table is a dataframe, however, my column types are objects. Is that why it isn't working?
All the best,
Thank you


This comment has been minimized.

Copy link

@balajikomma369 balajikomma369 commented Dec 9, 2019

try in this way
len (df['column'].value)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment