Skip to content

Instantly share code, notes, and snippets.

@BenjaminWolfe
Created April 28, 2021 02:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BenjaminWolfe/50b272da0c30431e72ff7273190221ae to your computer and use it in GitHub Desktop.
Save BenjaminWolfe/50b272da0c30431e72ff7273190221ae to your computer and use it in GitHub Desktop.
How do I use np.where with multiple columns at once?
# how to use np.where with multiple columns at once, even whole data frames?
import numpy as np
import pandas as pd
np.random.seed(42)
df1 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
condition = pd.Series(np.random.choice(a=[False, True], size=100, p=[.1, .9]))
np.where(condition, df1["A"], df2["B"]) # works, lost index + column name
pd.Series(np.where(condition, df1["A"], df2["A"]), index=df1.index, name="A") # works
np.where(condition, df1, df2) # ValueError: operands could not be broadcast together
np.where(condition, df1.T, df2.T) # aha! works - just is transposed
np.where(condition, df1.T, df2.T).T # works, now just needs index + column names
# perfect:
pd.DataFrame(np.where(condition, df1.T, df2.T).T, columns=df1.columns, index=df1.index)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment