Skip to content

Instantly share code, notes, and snippets.

@ki-chi
Created November 19, 2020 14:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ki-chi/cac1e8d8e864c0d99148024a5cf15fed to your computer and use it in GitHub Desktop.
Save ki-chi/cac1e8d8e864c0d99148024a5cf15fed to your computer and use it in GitHub Desktop.
# julia> versioninfo()
# Julia Version 1.5.0
# Commit 96786e22cc (2020-08-01 23:44 UTC)
# Platform Info:
# OS: macOS (x86_64-apple-darwin18.7.0)
# CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
# WORD_SIZE: 64
# LIBM: libopenlibm
# LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
using BenchmarkTools
using DataFrames
using StatsBase
using Random
Random.seed!(0)
nrow = 100
ncol = 400_000
df = DataFrame(sample([-1,0,1], (nrow, ncol)))
"""
jl_mode(df::DataFrame)
各列の最頻値を返す
Python版のオリジナル: https://twitter.com/nkay/status/1328231713919496194
"""
function jl_mode(df)
arr=convert(Matrix,df)
maxvalue, minvalue, ncol = maximum(arr), minimum(arr), size(arr,2)
offset = collect(1:ncol) .* (maxvalue-minvalue+1) .- minvalue
return argmax.(eachcol(reshape(counts(arr .+ offset'), :, ncol))) .+ (minvalue - 1)
end
@btime jl_mode(df);
# 346.782 ms (400026 allocations: 653.08 MiB)
# StatsBase.mode()を使った場合
@btime mapcols(mode, df);
# 895.601 ms (2399604 allocations: 333.43 MiB)
@ki-chi
Copy link
Author

ki-chi commented Nov 19, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment