Skip to content

Instantly share code, notes, and snippets.

@o314
Last active January 28, 2022 09:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save o314/214e26c6fb70512b56597d633dd87e6f to your computer and use it in GitHub Desktop.
Save o314/214e26c6fb70512b56597d633dd87e6f to your computer and use it in GitHub Desktop.
Multidispatch-friendly zen of unzip in julia

Multidispatch-friendly zen of unzip in julia

@ https://gist.github.com/o314/214e26c6fb70512b56597d633dd87e6f
see JuliaLang/julia#13942

OK unzip(a) = zip(a...) fails in Julia But Zen of python is great - ok, it's somewhat a lie, it's also great for julia ! So let's try to bring correct simple things first, and (maybe) complicated ones later or away.

FWIW, my good-enough-poor-man, but usable in prod, isn't it, unzip considering than

  • writing 5 lines > waiting 5 years
  • not optimized for nonexistant use case of unzip (for streaming or tensorflow or whatever)
  • third party pkg for 5 lines is a joke or an industrial hazard (software bom somebody?)
  • con : 5 sloc of src; 50 lines of test. so what?
using Test
import Base.Iterators as _I

unzip(s...) = unzip(collect(s))
unzip(vs::Vector{<:Vector}) =
    let M=length(vs), N=mapfoldl(length, min, vs); # todo remove me when SVector is in Base
        ([vs[i][j] for i in 1:M] for j in 1:N)
    end
unzip(a::Vector{<:Pair}) = [k for (k,_) in a], [v for (_,v) in a]

TEST

import Base.Iterators as _I
using Test
# zipdata(M,N) = let v=collect(1:M), vt=ntuple(N) do _; copy(v) end; vt end
data(M,N) = ntuple(M) do i; fill(i,N) end
data(N) = let ks=_I.take(_I.cycle('a':'z'), N), vs=(1:N...,); (k=>v for (k,v) in zip(ks,vs)) end

VALIDITY TEST

# unzip of vector
@test data(5,3) == ([1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5])
@test unzip(data(5,3)...) |> collect == ([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]) |> collect

# unzip of pair vector
@test data(5) |> collect == ('a'=>1, 'b'=>2, 'c'=>3, 'd'=>4, 'e'=>5) |> collect
@test unzip(data(5) |> collect) |> collect == (['a','b','c','d','e'], [1,2,3,4,5]) |> collect

SPEED TEST

# unzip of vector
julia> @time unzip(data(1000,3)...);
  0.029086 seconds (42.07 k allocations: 2.766 MiB, 99.28% compilation time)

julia> @time unzip(data(1_000_000,3)...);
  1.507531 seconds (4.04 M allocations: 223.797 MiB, 18.21% gc time, 72.32% compilation time)

julia> @time unzip(data(1_000_000,3)...);
  0.294386 seconds (1.00 M allocations: 152.588 MiB)

julia> @time unzip(data(1000,50)...);
  0.000727 seconds (1.01 k allocations: 531.922 KiB)

julia> @time unzip(data(1_000_000,50)...);
  1.082615 seconds (1.00 M allocations: 518.799 MiB, 48.09% gc time)

julia> @time unzip(data(1_000_000,50)...);
  0.527460 seconds (1.00 M allocations: 518.799 MiB)

# unzip of pair vector
julia> @time unzip(data(1000));
  2.728774 seconds (166.12 k allocations: 10.524 MiB, 99.98% compilation time)

julia> @time unzip(data(1000));
  0.000334 seconds (2.00 k allocations: 116.375 KiB)

julia> @time unzip(data(1_000_000) |> collect);              # BUG wo collect
  0.634841 seconds (3.50 M allocations: 178.888 MiB, 18.39% gc time, 57.41% compilation time)
@o314
Copy link
Author

o314 commented Jan 28, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment