Skip to content

Instantly share code, notes, and snippets.

@ki-chi
Created November 19, 2020 14:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ki-chi/a3b8343ea37f5bbc2c2ecc95390e9edb to your computer and use it in GitHub Desktop.
Save ki-chi/a3b8343ea37f5bbc2c2ecc95390e9edb to your computer and use it in GitHub Desktop.
import time
import numpy as np
import pandas as pd
def np_mode(df):
"""
与えられたDataFrameの各列から最頻値を返す
Original: https://twitter.com/nkay/status/1328231713919496194
"""
arr = df.to_numpy()
max, min, ncol = arr.max(), arr.min(), arr.shape[1]
offset = np.arange(ncol) * (max-min+1) - min
return np.bincount((arr+offset).ravel()).reshape(ncol, -1).argmax(1) + min
nrow = 100
ncol = 400_000
np.random.seed(0)
df = pd.DataFrame(np.random.choice([-1,0,1],(nrow,ncol)))
t1 = time.time()
np_mode(df)
t2 = time.time()
print(f"elapsed time: {t2-t1}")
# elapsed time: 0.2317368984222412
@ki-chi
Copy link
Author

ki-chi commented Nov 19, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment