Skip to content

Instantly share code, notes, and snippets.

@pcmasuzzo
Last active August 23, 2017 10:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pcmasuzzo/9b79b0d18175d488c9fb2001c9ab115b to your computer and use it in GitHub Desktop.
Save pcmasuzzo/9b79b0d18175d488c9fb2001c9ab115b to your computer and use it in GitHub Desktop.
Some useful pandas operations on dataframes
# import
import numpy as np
import pandas as pd
# convert a pandas dataframe to numpy excluding a column
data_matrix = np.array(data[data.columns.difference(['col_to_exclude'])], dtype=float)
# add new column to existing dataframe from another array
df = df.assign(new_col=vector.values)
# list unique values in a pandas dataframe
pd.unique(df.column_name.ravel())
# get dataframe rows where column has certain values
valuelist = ['value1', 'value2', 'value3']
df = df[df.column.isin(valuelist)]
# get dataframe rows where column does not have certain values
valuelist = ['value1', 'value2', 'value3']
df = df[~df.column.isin(value_list)]
# delete a column in a data frame
del df['column_name']
# or
df = df.drop('column_name', 1)
# delete the column without having to reassign to df
df.drop('column_name', axis=1, inplace=True)
# delete more than one column by column number
df.drop(df.columns[[0, 1, 3]], axis=1)
# select subset of dataframe using multiple criteria
new_df = df[(df['col_1']>20) & (df['col_2']==10)]
# create a pandas dataframe Python dictionary (dict_)
df = pd.DataFrame(list(dict_.items()), columns = ['column1', 'column2'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment