nalimilan/splitapplycombine.jl.mem Secret

## splitapplycombine.jl.mem
        - # this constant defines which types of values returned by aggregation function
        - # in combine are considered to produce multiple columns in the resulting data frame
        - const MULTI_COLS_TYPE = Union{AbstractDataFrame, NamedTuple, DataFrameRow, AbstractMatrix}
        -
        - """
        -     groupby(d::AbstractDataFrame, cols; sort=false, skipmissing=false)
        -
        - Return a `GroupedDataFrame` representing a view of an `AbstractDataFrame` split
        - into row groups.
        -
        - # Arguments
        - - `df` : an `AbstractDataFrame` to split
        - - `cols` : data frame columns to group by. Can be any column selector
        -   ($COLUMNINDEX_STR; $MULTICOLUMNINDEX_STR).
        - - `sort` : whether to sort groups according to the values of the grouping columns
        -   `cols`; if all `cols` are `CategoricalVector`s then groups are always sorted
        -   irrespective of the value of `sort`
        - - `skipmissing` : whether to skip groups with `missing` values in one of the
        -   grouping columns `cols`
        -
        - # Details
        - An iterator over a `GroupedDataFrame` returns a `SubDataFrame` view
        - for each grouping into `df`.
        - Within each group, the order of rows in `df` is preserved.
        -
        - `cols` can be any valid data frame indexing expression.
        - In particular if it is an empty vector then a single-group `GroupedDataFrame`
        - is created.
        -
        - A `GroupedDataFrame` also supports
        - indexing by groups, `map` (which applies a function to each group)
        - and `combine` (which applies a function to each group
        - and combines the result into a data frame).
        -
        - `GroupedDataFrame` also supports the dictionary interface. The keys are
        - [`GroupKey`](@ref) objects returned by [`keys(::GroupedDataFrame)`](@ref),
        - which can also be used to get the values of the grouping columns for each group.
        - `Tuples` and `NamedTuple`s containing the values of the grouping columns (in the
        - same order as the `cols` argument) are also accepted as indices. Finally,
        - an `AbstractDict` can be used to index into a grouped data frame where
        - the keys are column names of the data frame. The order of the keys does
        - not matter in this case.
        -
        - # See also
        -
        - [`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
        -
        - # Examples
        - ```julia
        - julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
        -                       b = repeat([2, 1], outer=[4]),
        -                       c = 1:8);
        -
        - julia> gd = groupby(df, :a)
        - GroupedDataFrame with 4 groups based on key: a
        - First Group (2 rows): a = 1
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │
        - │ 2   │ 1     │ 2     │ 5     │
        - ⋮
        - Last Group (2 rows): a = 4
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 4     │ 1     │ 4     │
        - │ 2   │ 4     │ 1     │ 8     │
        -
        - julia> gd[1]
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │
        - │ 2   │ 1     │ 2     │ 5     │
        -
        - julia> last(gd)
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 4     │ 1     │ 4     │
        - │ 2   │ 4     │ 1     │ 8     │
        -
        - julia> gd[(a=3,)]
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 3     │ 2     │ 3     │
        - │ 2   │ 3     │ 2     │ 7     │
        -
        - julia> gd[Dict("a" => 3)]
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 3     │ 2     │ 3     │
        - │ 2   │ 3     │ 2     │ 7     │
        -
        - julia> gd[(3,)]
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 3     │ 2     │ 3     │
        - │ 2   │ 3     │ 2     │ 7     │
        -
        - julia> k = first(keys(gd))
        - GroupKey: (a = 3)
        -
        - julia> gd[k]
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 3     │ 2     │ 3     │
        - │ 2   │ 3     │ 2     │ 7     │
        -
        - julia> for g in gd
        -            println(g)
        -        end
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │
        - │ 2   │ 1     │ 2     │ 5     │
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 2     │ 1     │ 2     │
        - │ 2   │ 2     │ 1     │ 6     │
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 3     │ 2     │ 3     │
        - │ 2   │ 3     │ 2     │ 7     │
        - 2×3 SubDataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 4     │ 1     │ 4     │
        - │ 2   │ 4     │ 1     │ 8     │
        - ```
        - """
        - function groupby(df::AbstractDataFrame, cols;
        -                  sort::Bool=false, skipmissing::Bool=false)
        0     _check_consistency(df)
        0     idxcols = index(df)[cols]
        -     if isempty(idxcols)
        -         return GroupedDataFrame(df, Symbol[], ones(Int, nrow(df)),
        -                                 nothing, nothing, nothing, nrow(df) == 0 ? 0 : 1,
        -                                 nothing, Threads.ReentrantLock())
        -     end
       96     sdf = select(df, idxcols, copycols=false)
        -
 80000080     groups = Vector{Int}(undef, nrow(df))
       48     ngroups, rhashes, gslots, sorted =
        -         row_group_slots(ntuple(i -> sdf[!, i], ncol(sdf)), Val(false), groups, skipmissing)
        -
      288     gd = GroupedDataFrame(df, copy(_names(sdf)), groups, nothing, nothing, nothing, ngroups, nothing,
        -                           Threads.ReentrantLock())
        -
        -     # sort groups if row_group_slots hasn't already done that
        0     if sort && !sorted
        -         # Find index of representative row for each group
        0         idx = Vector{Int}(undef, length(gd))
        0         fillfirst!(nothing, idx, 1:nrow(parent(gd)), gd)
        0         group_invperm = invperm(sortperm(view(parent(gd)[!, gd.cols], idx, :)))
        0         groups = gd.groups
        0         @inbounds for i in eachindex(groups)
        0             gix = groups[i]
        0             groups[i] = gix == 0 ? 0 : group_invperm[gix]
        -         end
        -     end
        -
        0     return gd
        - end
        -
        - const F_TYPE_RULES =
        -     """
        -     `fun` can return a single value, a row, a vector, or multiple rows.
        -     The type of the returned value determines the shape of the resulting `DataFrame`.
        -     There are four kind of return values allowed:
        -     - A single value gives a `DataFrame` with a single additional column and one row
        -       per group.
        -     - A named tuple of single values or a [`DataFrameRow`](@ref) gives a `DataFrame`
        -       with one additional column for each field and one row per group (returning a
        -       named tuple will be faster). It is not allowed to mix single values and vectors
        -       if a named tuple is returned.
        -     - A vector gives a `DataFrame` with a single additional column and as many rows
        -       for each group as the length of the returned vector for that group.
        -     - A data frame, a named tuple of vectors or a matrix gives a `DataFrame` with
        -       the same additional columns and as many rows for each group as the rows
        -       returned for that group (returning a named tuple is the fastest option).
        -       Returning a table with zero columns is allowed, whatever the number of columns
        -       returned for other groups.
        -
        -     `fun` must always return the same kind of object (out of four
        -     kinds defined above) for all groups, and with the same column names.
        -
        -     Optimized methods are used when standard summary functions (`sum`, `prod`,
        -     `minimum`, `maximum`, `mean`, `var`, `std`, `first`, `last` and `length`)
        -     are specified using the `Pair` syntax (e.g. `:col => sum`).
        -     When computing the `sum` or `mean` over floating point columns, results will be
        -     less accurate than the standard `sum` function (which uses pairwise
        -     summation). Use `col => x -> sum(x)` to avoid the optimized method and use the
        -     slower, more accurate one.
        -
        -     Column names are automatically generated when necessary using the rules defined
        -     in [`select`](@ref) if the `Pair` syntax is used and `fun` returns a single
        -     value or a vector (e.g. for `:col => sum` the column name is `col_sum`); otherwise
        -     (if `fun` is a function or a return value is an `AbstractMatrix`) columns are
        -     named `x1`, `x2` and so on.
        -     """
        -
        - const F_ARGUMENT_RULES =
        -     """
        -
        -     Arguments passed as `args...` can be:
        -
        -     * Any index that is allowed for column indexing ($COLUMNINDEX_STR, $MULTICOLUMNINDEX_STR).
        -     * Column transformation operations using the `Pair` notation that is described below
        -       and vectors of such pairs.
        -
        -     Transformations allowed using `Pair`s follow the rules specified for
        -     [`select`](@ref) and have the form `source_cols => fun`, `source_cols => fun
        -     => target_col`, or `source_col => target_col`. Function `fun` is passed
        -     `SubArray` views as positional arguments for each column specified to be
        -     selected, or a `NamedTuple` containing these `SubArray`s if `source_cols` is
        -     an `AsTable` selector. It can return a vector or a single value (defined
        -     precisely below). If automatic generation of target column
        -     name is required it respects the `renamecols` keyword argument following the
        -     rules described in [`select`](@ref).
        -
        -     As a special case `nrow` or `nrow => target_col` can be passed without specifying
        -     input columns to efficiently calculate number of rows in each group.
        -     If `nrow` is passed the resulting column name is `:nrow`.
        -
        -     If multiple `args` are passed then return values of different `fun`s are allowed
        -     to mix single values and vectors. In this case single values will be
        -     broadcasted to match the length of columns specified by returned vectors.
        -     As a particular rule, values wrapped in a `Ref` or a `0`-dimensional `AbstractArray`
        -     are unwrapped and then broadcasted.
        -
        -     If the first or last argument is `pair` then it must be a `Pair` following the
        -     rules for pairs described above, except that in this case function defined
        -     by `fun` can return any return value defined below.
        -
        -     If the first or last argument is a function `fun`, it is passed a [`SubDataFrame`](@ref)
        -     view for each group and can return any return value defined below.
        -     Note that this form is slower than `pair` or `args` due to type instability.
        -
        -     If `gd` has zero groups then no transformations are applied.
        -     """
        -
        - const KWARG_PROCESSING_RULES =
        -     """
        -     If `keepkeys=true`, the resulting `DataFrame` contains all the grouping columns
        -     in addition to those generated. In this case if the returned
        -     value contains columns with the same names as the grouping columns, they are
        -     required to be equal.
        -     If `keepkeys=false` and some generated columns have the same name as grouping columns,
        -     they are kept and are not required to be equal to grouping columns.
        -
        -     If `ungroup=true` (the default) a `DataFrame` is returned.
        -     If `ungroup=false` a `GroupedDataFrame` grouped using `keycols(gdf)` is returned.
        -
        -     If `gd` has zero groups then transformations are applied to vectors of zero length.
        -     """
        -
        - """
        -     combine(gd::GroupedDataFrame, args...; keepkeys::Bool=true, ungroup::Bool=true,
        -             renamecols::Bool=true)
        -     combine(fun::Union{Function, Type}, gd::GroupedDataFrame;
        -             keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
        -     combine(pair::Pair, gd::GroupedDataFrame; keepkeys::Bool=true, ungroup::Bool=true,
        -             renamecols::Bool=true)
        -
        - Apply operations to each group in a [`GroupedDataFrame`](@ref) and return the combined
        - result as a `DataFrame` if `ungroup=true` or `GroupedDataFrame` if `ungroup=false`.
        -
        - If an `AbstractDataFrame` is passed, apply operations to the data frame as a whole
        - and a `DataFrame` is always returend.
        -
        - $F_ARGUMENT_RULES
        -
        - $F_TYPE_RULES
        -
        - $KWARG_PROCESSING_RULES
        -
        - Ordering of rows follows the order of groups in `gdf`.
        -
        - # See also
        -
        - [`groupby`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
        -
        - # Examples
        - ```jldoctest
        - julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
        -                       b = repeat([2, 1], outer=[4]),
        -                       c = 1:8);
        -
        - julia> gd = groupby(df, :a);
        -
        - julia> combine(gd, :c => sum, nrow)
        - 4×3 DataFrame
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 6     │ 2     │
        - │ 2   │ 2     │ 8     │ 2     │
        - │ 3   │ 3     │ 10    │ 2     │
        - │ 4   │ 4     │ 12    │ 2     │
        -
        - julia> combine(gd, :c => sum, nrow, ungroup=false)
        - GroupedDataFrame with 4 groups based on key: a
        - First Group (1 row): a = 1
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 6     │ 2     │
        - ⋮
        - Last Group (1 row): a = 4
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 4     │ 12    │ 2     │
        -
        - julia> combine(sdf -> sum(sdf.c), gd) # Slower variant
        - 4×2 DataFrame
        - │ Row │ a     │ x1    │
        - │     │ Int64 │ Int64 │
        - ├─────┼───────┼───────┤
        - │ 1   │ 1     │ 6     │
        - │ 2   │ 2     │ 8     │
        - │ 3   │ 3     │ 10    │
        - │ 4   │ 4     │ 12    │
        -
        - julia> combine(gdf) do d # do syntax for the slower variant
        -            sum(d.c)
        -        end
        - 4×2 DataFrame
        - │ Row │ a     │ x1    │
        - │     │ Int64 │ Int64 │
        - ├─────┼───────┼───────┤
        - │ 1   │ 1     │ 6     │
        - │ 2   │ 2     │ 8     │
        - │ 3   │ 3     │ 10    │
        - │ 4   │ 4     │ 12    │
        -
        - julia> combine(gd, :c => (x -> sum(log, x)) => :sum_log_c) # specifying a name for target column
        - 4×2 DataFrame
        - │ Row │ a     │ sum_log_c │
        - │     │ Int64 │ Float64   │
        - ├─────┼───────┼───────────┤
        - │ 1   │ 1     │ 1.60944   │
        - │ 2   │ 2     │ 2.48491   │
        - │ 3   │ 3     │ 3.04452   │
        - │ 4   │ 4     │ 3.46574   │
        -
        -
        - julia> combine(gd, [:b, :c] .=> sum) # passing a vector of pairs
        - 4×3 DataFrame
        - │ Row │ a     │ b_sum │ c_sum │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 4     │ 6     │
        - │ 2   │ 2     │ 2     │ 8     │
        - │ 3   │ 3     │ 4     │ 10    │
        - │ 4   │ 4     │ 2     │ 12    │
        -
        - julia> combine(gd) do sdf # dropping group when DataFrame() is returned
        -           sdf.c[1] != 1 ? sdf : DataFrame()
        -        end
        - 6×3 DataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 2     │ 1     │ 2     │
        - │ 2   │ 2     │ 1     │ 6     │
        - │ 3   │ 3     │ 2     │ 3     │
        - │ 4   │ 3     │ 2     │ 7     │
        - │ 5   │ 4     │ 1     │ 4     │
        - │ 6   │ 4     │ 1     │ 8     │
        -
        - julia> combine(gd, :b => :b1, :c => :c1,
        -                [:b, :c] => +, keepkeys=false) # auto-splatting, renaming and keepkeys
        - 8×3 DataFrame
        - │ Row │ b1    │ c1    │ b_c_+ │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 2     │ 1     │ 3     │
        - │ 2   │ 2     │ 5     │ 7     │
        - │ 3   │ 1     │ 2     │ 3     │
        - │ 4   │ 1     │ 6     │ 7     │
        - │ 5   │ 2     │ 3     │ 5     │
        - │ 6   │ 2     │ 7     │ 9     │
        - │ 7   │ 1     │ 4     │ 5     │
        - │ 8   │ 1     │ 8     │ 9     │
        -
        - julia> combine(gd, :b, :c => sum) # passing columns and broadcasting
        - 8×3 DataFrame
        - │ Row │ a     │ b     │ c_sum │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 6     │
        - │ 2   │ 1     │ 2     │ 6     │
        - │ 3   │ 2     │ 1     │ 8     │
        - │ 4   │ 2     │ 1     │ 8     │
        - │ 5   │ 3     │ 2     │ 10    │
        - │ 6   │ 3     │ 2     │ 10    │
        - │ 7   │ 4     │ 1     │ 12    │
        - │ 8   │ 4     │ 1     │ 12    │
        -
        - julia> combine(gd, [:b, :c] .=> Ref)
        - 4×3 DataFrame
        - │ Row │ a     │ b_Ref    │ c_Ref    │
        - │     │ Int64 │ SubArra… │ SubArra… │
        - ├─────┼───────┼──────────┼──────────┤
        - │ 1   │ 1     │ [2, 2]   │ [1, 5]   │
        - │ 2   │ 2     │ [1, 1]   │ [2, 6]   │
        - │ 3   │ 3     │ [2, 2]   │ [3, 7]   │
        - │ 4   │ 4     │ [1, 1]   │ [4, 8]   │
        -
        - julia> combine(gd, AsTable(:) => Ref)
        - 4×2 DataFrame
        - │ Row │ a     │ a_b_c_Ref                            │
        - │     │ Int64 │ NamedTuple…                          │
        - ├─────┼───────┼──────────────────────────────────────┤
        - │ 1   │ 1     │ (a = [1, 1], b = [2, 2], c = [1, 5]) │
        - │ 2   │ 2     │ (a = [2, 2], b = [1, 1], c = [2, 6]) │
        - │ 3   │ 3     │ (a = [3, 3], b = [2, 2], c = [3, 7]) │
        - │ 4   │ 4     │ (a = [4, 4], b = [1, 1], c = [4, 8]) │
        -
        - julia> combine(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
        - 8×4 DataFrame
        - │ Row │ a     │ b     │ c     │ b_c   │
        - │     │ Int64 │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │ 3     │
        - │ 2   │ 1     │ 2     │ 5     │ 7     │
        - │ 3   │ 2     │ 1     │ 2     │ 3     │
        - │ 4   │ 2     │ 1     │ 6     │ 7     │
        - │ 5   │ 3     │ 2     │ 3     │ 5     │
        - │ 6   │ 3     │ 2     │ 7     │ 9     │
        - │ 7   │ 4     │ 1     │ 4     │ 5     │
        - │ 8   │ 4     │ 1     │ 8     │ 9     │
        - ```
        - """
        - function combine(f::Base.Callable, gd::GroupedDataFrame;
        -                  keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
        -     return combine_helper(f, gd, keepkeys=keepkeys, ungroup=ungroup,
        -                           copycols=true, keeprows=false, renamecols=renamecols)
        - end
        -
        - combine(f::typeof(nrow), gd::GroupedDataFrame;
        -         keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
        -     combine(gd, [nrow => :nrow], keepkeys=keepkeys, ungroup=ungroup,
        -             renamecols=renamecols)
        -
        - function combine(p::Pair, gd::GroupedDataFrame;
        -                  keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
        -     # move handling of aggregate to specialized combine
        -     p_from, p_to = p
        -
        -     # verify if it is not better to use a fast path, which we achieve
        -     # by moving to combine(::GroupedDataFrame, ::AbstractVector) method
        -     # note that even if length(gd) == 0 we can do this step
        -     if isagg(p_from => (p_to isa Pair ? first(p_to) : p_to), gd) || p_from === nrow
        -         return combine(gd, [p], keepkeys=keepkeys, ungroup=ungroup, renamecols=renamecols)
        -     end
        -
        -     if p_from isa Tuple
        -         cs = collect(p_from)
        -         # an explicit error is thrown as this was allowed in the past
        -         throw(ArgumentError("passing a Tuple $p_from as column selector is not supported" *
        -                             ", use a vector $cs instead"))
        -     else
        -         cs = p_from
        -     end
        -     return combine_helper(cs => p_to, gd, keepkeys=keepkeys, ungroup=ungroup,
        -                           copycols=true, keeprows=false, renamecols=renamecols)
        - end
        -
        - combine(gd::GroupedDataFrame,
        -         cs::Union{Pair, typeof(nrow), ColumnIndex, MultiColumnIndex}...;
        -         keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
        -     _combine_prepare(gd, cs..., keepkeys=keepkeys, ungroup=ungroup,
        -                      copycols=true, keeprows=false, renamecols=renamecols)
        -
        - function _combine_prepare(gd::GroupedDataFrame,
        -                           @nospecialize(cs::Union{Pair, typeof(nrow),
        -                                         ColumnIndex, MultiColumnIndex}...);
        -                           keepkeys::Bool, ungroup::Bool, copycols::Bool,
        -                           keeprows::Bool, renamecols::Bool)
        -     cs_vec = []
        -     for p in cs
        -         if p === nrow
        -             push!(cs_vec, nrow => :nrow)
        -         elseif p isa AbstractVector{<:Pair}
        -             append!(cs_vec, p)
        -         else
        -             push!(cs_vec, p)
        -         end
        -     end
        -     if any(x -> x isa Pair && first(x) isa Tuple, cs_vec)
        -         x = cs_vec[findfirst(x -> first(x) isa Tuple, cs_vec)]
        -         # an explicit error is thrown as this was allowed in the past
        -         throw(ArgumentError("passing a Tuple $(first(x)) as column selector is not supported" *
        -                             ", use a vector $(collect(first(x))) instead"))
        -         for (i, v) in enumerate(cs_vec)
        -             if first(v) isa Tuple
        -                 cs_vec[i] = collect(first(v)) => last(v)
        -             end
        -         end
        -     end
        -     cs_norm_pre = [normalize_selection(index(parent(gd)), c, renamecols) for c in cs_vec]
        -     seen_cols = Set{Symbol}()
        -     process_vectors = false
        -     for v in cs_norm_pre
        -         if v isa Pair
        -             out_col = last(last(v))
        -             if out_col in seen_cols
        -                 throw(ArgumentError("Duplicate output column name $out_col requested"))
        -             end
        -             push!(seen_cols, out_col)
        -         else
        -             @assert v isa AbstractVector{Int}
        -             process_vectors = true
        -         end
        -     end
        -     processed_cols = Set{Symbol}()
        -     if process_vectors
        -         cs_norm = Pair[]
        -         for (i, v) in enumerate(cs_norm_pre)
        -             if v isa Pair
        -                 push!(cs_norm, v)
        -                 push!(processed_cols, last(last(v)))
        -             else
        -                 @assert v isa AbstractVector{Int}
        -                 for col_idx in v
        -                     col_name = _names(gd)[col_idx]
        -                     if !(col_name in processed_cols)
        -                         push!(processed_cols, col_name)
        -                         if col_name in seen_cols
        -                             trans_idx = findfirst(cs_norm_pre) do p
        -                                 p isa Pair || return false
        -                                 last(last(p)) == col_name
        -                             end
        -                             @assert !isnothing(trans_idx) && trans_idx > i
        -                             push!(cs_norm, cs_norm_pre[trans_idx])
        -                             # it is safe to delete from cs_norm_pre
        -                             # as we have not reached trans_idx index yet
        -                             deleteat!(cs_norm_pre, trans_idx)
        -                         else
        -                             push!(cs_norm, col_idx => identity => col_name)
        -                         end
        -                     end
        -                 end
        -             end
        -         end
        -     else
        -         cs_norm = collect(Pair, cs_norm_pre)
        -     end
        -     f = Pair[first(x) => first(last(x)) for x in cs_norm]
        -     nms = Symbol[last(last(x)) for x in cs_norm]
        -     return combine_helper(f, gd, nms, keepkeys=keepkeys, ungroup=ungroup,
        -                           copycols=copycols, keeprows=keeprows, renamecols=renamecols)
        - end
        -
        - function gen_groups(idx::Vector{Int})
        0     groups = zeros(Int, length(idx))
        0     groups[1] = 1
        -     j = 1
        0     last_idx = idx[1]
        0     @inbounds for i in 2:length(idx)
        0         cur_idx = idx[i]
        0         j += cur_idx != last_idx
        -         last_idx = cur_idx
        0         groups[i] = j
        -     end
        0     return groups
        - end
        -
        - function combine_helper(f, gd::GroupedDataFrame,
        -                         nms::Union{AbstractVector{Symbol},Nothing}=nothing;
        -                         keepkeys::Bool, ungroup::Bool,
        -                         copycols::Bool, keeprows::Bool, renamecols::Bool)
       16     if !ungroup && !keepkeys
        0         throw(ArgumentError("keepkeys=false when ungroup=false is not allowed"))
        -     end
       32     idx, valscat = _combine(f, gd, nms, copycols, keeprows, renamecols)
        0     !keepkeys && ungroup && return valscat
        0     keys = groupcols(gd)
        0     for key in keys
        0         if hasproperty(valscat, key)
        0             if (keeprows && !isequal(valscat[!, key], parent(gd)[!, key])) ||
        -                 (!keeprows && !isequal(valscat[!, key], view(parent(gd)[!, key], idx)))
        0                 throw(ArgumentError("column :$key in returned data frame " *
        -                                     "is not equal to grouping key :$key"))
        -             end
        -         end
        -     end
        0     if keeprows
        0         newparent = select(parent(gd), gd.cols, copycols=copycols)
        -     else
      224         newparent = length(gd) > 0 ? parent(gd)[idx, gd.cols] : parent(gd)[1:0, gd.cols]
        -     end
       16     added_cols = select(valscat, Not(intersect(keys, _names(valscat))), copycols=false)
      384     hcat!(newparent, length(gd) > 0 ? added_cols : similar(added_cols, 0), copycols=false)
        0     ungroup && return newparent
        -
        0     if length(idx) == 0 && !(keeprows && length(keys) > 0)
        0         @assert nrow(newparent) == 0
        0         return GroupedDataFrame(newparent, copy(gd.cols), Int[],
        -                                 Int[], Int[], Int[], 0, Dict{Any,Int}(),
        -                                 Threads.ReentrantLock())
        0     elseif keeprows
        0         @assert length(keys) > 0 || idx == gd.idx
        -         # in this case we are sure that the result GroupedDataFrame has the
        -         # same structure as the source except that grouping columns are at the start
        0         return Threads.lock(gd.lazy_lock) do
        0             return GroupedDataFrame(newparent, copy(gd.cols), gd.groups,
        -                                     getfield(gd, :idx), getfield(gd, :starts),
        -                                     getfield(gd, :ends), gd.ngroups,
        -                                     getfield(gd, :keymap), Threads.ReentrantLock())
        -         end
        -     else
        0         groups = gen_groups(idx)
        0         @assert groups[end] <= length(gd)
        0         return GroupedDataFrame(newparent, copy(gd.cols), groups,
        -                                 nothing, nothing, nothing, groups[end], nothing,
        -                                 Threads.ReentrantLock())
        -     end
        - end
        -
        - # Wrapping automatically adds column names when the value returned
        - # by the user-provided function lacks them
        - wrap(x::Union{AbstractDataFrame, NamedTuple, DataFrameRow}) = x
        - wrap(x::AbstractMatrix) =
        -     NamedTuple{Tuple(gennames(size(x, 2)))}(Tuple(view(x, :, i) for i in 1:size(x, 2)))
        - wrap(x::Any) = (x1=x,)
        -
        - const ERROR_ROW_COUNT = "return value must not change its kind " *
        -                         "(single row or variable number of rows) across groups"
        -
        - const ERROR_COL_COUNT = "function must return only single-column values, " *
        -                         "or only multiple-column values"
        -
        - wrap_table(x::Any, ::Val) =
        -     throw(ArgumentError(ERROR_ROW_COUNT))
        - function wrap_table(x::Union{NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}},
        -                              AbstractDataFrame, AbstractMatrix},
        -                              ::Val{firstmulticol}) where firstmulticol
        -     if !firstmulticol
        -         throw(ArgumentError(ERROR_COL_COUNT))
        -     end
        -     return wrap(x)
        - end
        -
        - function wrap_table(x::AbstractVector, ::Val{firstmulticol}) where firstmulticol
        -     if firstmulticol
        -         throw(ArgumentError(ERROR_COL_COUNT))
        -     end
        -     return wrap(x)
        - end
        -
        - function wrap_row(x::Any, ::Val{firstmulticol}) where firstmulticol
        -     # NamedTuple is not possible in this branch
        -     if (x isa DataFrameRow) ⊻ firstmulticol
        -         throw(ArgumentError(ERROR_COL_COUNT))
        -     end
        0     return wrap(x)
        - end
        -
        - function wrap_row(x::Union{AbstractArray{<:Any, 0}, Ref},
        -                   ::Val{firstmulticol}) where firstmulticol
        -     if firstmulticol
        -         throw(ArgumentError(ERROR_COL_COUNT))
        -     end
        -     return (x1 = x[],)
        - end
        -
        - # note that also NamedTuple() is correctly captured by this definition
        - # as it is more specific than the one below
        - wrap_row(::Union{AbstractVecOrMat, AbstractDataFrame,
        -                  NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}, ::Val) =
        -     throw(ArgumentError(ERROR_ROW_COUNT))
        -
        - function wrap_row(x::NamedTuple, ::Val{firstmulticol}) where firstmulticol
        -     if any(v -> v isa AbstractVector, x)
        -         throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
        -     end
        -     if !firstmulticol
        -         throw(ArgumentError(ERROR_COL_COUNT))
        -     end
        -     return x
        - end
        -
        - # idx, starts and ends are passed separately to avoid cost of field access in tight loop
        - # Manual unrolling of Tuple is used as it turned out more efficient than @generated
        - # for small number of columns passed.
        - # For more than 4 columns `map` is slower than @generated
        - # but this case is probably rare and if huge number of columns is passed @generated
        - # has very high compilation cost
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::Tuple{}, i::Integer)
        -     f()
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::Tuple{AbstractVector}, i::Integer)
620373392     idx = idx[starts[i]:ends[i]]
        0     return f(view(incols[1], idx))
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::NTuple{2, AbstractVector}, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(view(incols[1], idx), view(incols[2], idx))
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::NTuple{3, AbstractVector}, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx))
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::NTuple{4, AbstractVector}, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx),
        -              view(incols[4], idx))
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::Tuple, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(map(c -> view(c, idx), incols)...)
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::NamedTuple, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(map(c -> view(c, idx), incols))
        - end
        -
        - function do_call(f::Any, idx::AbstractVector{<:Integer},
        -                  starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
        -                  gd::GroupedDataFrame, incols::Nothing, i::Integer)
        -     idx = idx[starts[i]:ends[i]]
        -     return f(view(parent(gd), idx, :))
        - end
        -
        - _nrow(df::AbstractDataFrame) = nrow(df)
        - _nrow(x::NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}) =
        -     isempty(x) ? 0 : length(x[1])
        - _ncol(df::AbstractDataFrame) = ncol(df)
        - _ncol(x::Union{NamedTuple, DataFrameRow}) = length(x)
        -
        - abstract type AbstractAggregate end
        -
        - struct Reduce{O, C, A} <: AbstractAggregate
        -     op::O
        -     condf::C
        -     adjust::A
        -     checkempty::Bool
        - end
        - Reduce(f, condf=nothing, adjust=nothing) = Reduce(f, condf, adjust, false)
        -
        - check_aggregate(f::Any, ::AbstractVector) = f
        - check_aggregate(f::typeof(sum), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.add_sum)
        - check_aggregate(f::typeof(sum∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.add_sum, !ismissing)
        - check_aggregate(f::typeof(prod), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.mul_prod)
        - check_aggregate(f::typeof(prod∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.mul_prod, !ismissing)
        - check_aggregate(f::typeof(maximum),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(maximum), v::AbstractVector{<:Union{Missing, Real}}) =
        -     eltype(v) === Any ? f : Reduce(max)
        - check_aggregate(f::typeof(maximum∘skipmissing),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(maximum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
        -     eltype(v) === Any ? f : Reduce(max, !ismissing, nothing, true)
        - check_aggregate(f::typeof(minimum),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(minimum), v::AbstractVector{<:Union{Missing, Real}}) =
        -     eltype(v) === Any ? f : Reduce(min)
        - check_aggregate(f::typeof(minimum∘skipmissing),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(minimum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
        -     eltype(v) === Any ? f : Reduce(min, !ismissing, nothing, true)
        - check_aggregate(f::typeof(mean), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.add_sum, nothing, /)
        - check_aggregate(f::typeof(mean∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Reduce(Base.add_sum, !ismissing, /)
        -
        - # Other aggregate functions which are not strictly reductions
        - struct Aggregate{F, C} <: AbstractAggregate
        -     f::F
        -     condf::C
        - end
        - Aggregate(f) = Aggregate(f, nothing)
        -
        - check_aggregate(f::typeof(var), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Aggregate(var)
        - check_aggregate(f::typeof(var∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Aggregate(var, !ismissing)
        - check_aggregate(f::typeof(std), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Aggregate(std)
        - check_aggregate(f::typeof(std∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
        -     Aggregate(std, !ismissing)
        - check_aggregate(f::typeof(first), v::AbstractVector) =
        -     eltype(v) === Any ? f : Aggregate(first)
        - check_aggregate(f::typeof(first),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(first∘skipmissing), v::AbstractVector) =
        -     eltype(v) === Any ? f : Aggregate(first, !ismissing)
        - check_aggregate(f::typeof(first∘skipmissing),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(last), v::AbstractVector) =
        -     eltype(v) === Any ? f : Aggregate(last)
        - check_aggregate(f::typeof(last),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(last∘skipmissing), v::AbstractVector) =
        -     eltype(v) === Any ? f : Aggregate(last, !ismissing)
        - check_aggregate(f::typeof(last∘skipmissing),
        -                 ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
        - check_aggregate(f::typeof(length), ::AbstractVector) = Aggregate(length)
        -
        - # SkipMissing does not support length
        -
        - # Find first value matching condition for each group
        - # Optimized for situations where a matching value is typically encountered
        - # among the first rows for each group
        - function fillfirst!(condf, outcol::AbstractVector, incol::AbstractVector,
        -                     gd::GroupedDataFrame; rev::Bool=false)
        0     ngroups = gd.ngroups
        -     # Use group indices if they have already been computed
        0     idx = getfield(gd, :idx)
        0     if idx !== nothing && condf === nothing
        0         v = rev ? gd.ends : gd.starts
        0         @inbounds for i in 1:ngroups
        0             outcol[i] = incol[idx[v[i]]]
        -         end
        0     elseif idx !== nothing
        -         nfilled = 0
        0         starts = gd.starts
        0         @inbounds for i in eachindex(outcol)
        0             s = starts[i]
        0             offsets = rev ? (nrow(gd[i])-1:-1:0) : (0:nrow(gd[i])-1)
        0             for j in offsets
        0                 x = incol[idx[s+j]]
        0                 if !condf === nothing || condf(x)
        -                     outcol[i] = x
        -                     nfilled += 1
        0                     break
        -                 end
        -             end
        -         end
        0         if nfilled < length(outcol)
        0             throw(ArgumentError("some groups contain only missing values"))
        -         end
        -     else # Finding first row is faster than computing all group indices
        0         groups = gd.groups
        0         if rev
        0             r = length(groups):-1:1
        -         else
        0             r = 1:length(groups)
        -         end
        0         filled = fill(false, ngroups)
        -         nfilled = 0
        0         @inbounds for i in r
        0             gix = groups[i]
        0             x = incol[i]
        0             if gix > 0 && (condf === nothing || condf(x)) && !filled[gix]
        0                 filled[gix] = true
        0                 outcol[gix] = x
        0                 nfilled += 1
        0                 nfilled == ngroups && break
        -             end
        -         end
        0         if nfilled < length(outcol)
        0             throw(ArgumentError("some groups contain only missing values"))
        -         end
        -     end
        0     outcol
        - end
        -
        - # Use a strategy similar to reducedim_init from Base to get the vector of the right type
        - function groupreduce_init(op, condf, adjust,
        -                           incol::AbstractVector{U}, gd::GroupedDataFrame) where U
        -     T = Base.promote_union(U)
        -
        -     if op === Base.add_sum
        -         initf = zero
        -     elseif op === Base.mul_prod
        -         initf = one
        -     else
        -         throw(ErrorException("Unrecognized op $op"))
        -     end
        -
        -     Tnm = nonmissingtype(T)
        -     if isconcretetype(Tnm) && applicable(initf, Tnm)
        -         tmpv = initf(Tnm)
        -         initv = op(tmpv, tmpv)
        -         if adjust isa Nothing
        -             x = Tnm <: AbstractIrrational ? float(initv) : initv
        -         else
        -             x = adjust(initv, 1)
        -         end
        -         if condf === !ismissing
        -             V = typeof(x)
        -         else
        -             V = U >: Missing ? Union{typeof(x), Missing} : typeof(x)
        -         end
        -         v = similar(incol, V, length(gd))
        -         fill!(v, x)
        -         return v
        -     else
        -         # do not try to determine the narrowest possible type nor starting value
        -         # as this is not possible to do correctly in general without processing
        -         # groups; it will get fixed later in groupreduce!; later we
        -         # will make use of the fact that this vector is filled with #undef
        -         # while above the vector is filled with a concrete value
        -         return Vector{Any}(undef, length(gd))
        -     end
        - end
        -
        - for (op, initf) in ((:max, :typemin), (:min, :typemax))
        -     @eval begin
        -         function groupreduce_init(::typeof($op), condf, adjust,
        -                                   incol::AbstractVector{T}, gd::GroupedDataFrame) where T
        -             @assert isnothing(adjust)
        -             S = nonmissingtype(T)
        -             # !ismissing check is purely an optimization to avoid a copy later
        -             outcol = similar(incol, condf === !ismissing ? S : T, length(gd))
        -             # Comparison is possible only between CatValues from the same pool
        -             if incol isa CategoricalVector
        -                 U = Union{CategoricalArrays.leveltype(outcol),
        -                           eltype(outcol) >: Missing ? Missing : Union{}}
        -                 outcol = CategoricalArray{U, 1}(outcol.refs, incol.pool)
        -             end
        -             # It is safe to use a non-missing init value
        -             # since missing will poison the result if present
        -             # we assume here that groups are non-empty (current design assures this)
        -             # + workaround for https://github.com/JuliaLang/julia/issues/36978
        -             if isconcretetype(S) && hasmethod($initf, Tuple{S}) && !(S <: Irrational)
        -                 fill!(outcol, $initf(S))
        -             else
        -                 fillfirst!(condf, outcol, incol, gd)
        -             end
        -             return outcol
        -         end
        -     end
        - end
        -
        - function copyto_widen!(res::AbstractVector{T}, x::AbstractVector) where T
        -     @inbounds for i in eachindex(res, x)
        -         val = x[i]
        -         S = typeof(val)
        -         if S <: T || promote_type(S, T) <: T
        -             res[i] = val
        -         else
        -             newres = Tables.allocatecolumn(promote_type(S, T), length(x))
        -             return copyto_widen!(newres, x)
        -         end
        -     end
        -     return res
        - end
        -
        - function groupreduce!(res::AbstractVector, f, op, condf, adjust, checkempty::Bool,
        -                       incol::AbstractVector, gd::GroupedDataFrame)
        -     n = length(gd)
        -     if adjust !== nothing || checkempty
        -         counts = zeros(Int, n)
        -     end
        -     groups = gd.groups
        -     @inbounds for i in eachindex(incol, groups)
        -         gix = groups[i]
        -         x = incol[i]
        -         if gix > 0 && (condf === nothing || condf(x))
        -             # this check should be optimized out if U is not Any
        -             if eltype(res) === Any && !isassigned(res, gix)
        -                 res[gix] = f(x, gix)
        -             else
        -                 res[gix] = op(res[gix], f(x, gix))
        -             end
        -             if adjust !== nothing || checkempty
        -                 counts[gix] += 1
        -             end
        -         end
        -     end
        -     # handle the case of an unitialized reduction
        -     if eltype(res) === Any
        -         if op === Base.add_sum
        -             initf = zero
        -         elseif op === Base.mul_prod
        -             initf = one
        -         else
        -             initf = x -> throw(ErrorException("Unrecognized op $op"))
        -         end
        -         @inbounds for gix in eachindex(res)
        -             if !isassigned(res, gix)
        -                 res[gix] = initf(nonmissingtype(eltype(incol)))
        -             end
        -         end
        -     end
        -     if adjust !== nothing
        -         res .= adjust.(res, counts)
        -     end
        -     if checkempty && any(iszero, counts)
        -         throw(ArgumentError("some groups contain only missing values"))
        -     end
        -     # Undo pool sharing done by groupreduce_init
        -     if res isa CategoricalVector && res.pool === incol.pool
        -         V = Union{CategoricalArrays.leveltype(res),
        -                   eltype(res) >: Missing ? Missing : Union{}}
        -         res = CategoricalArray{V, 1}(res.refs, copy(res.pool))
        -     end
        -     if isconcretetype(eltype(res))
        -         return res
        -     else
        -         return copyto_widen!(Tables.allocatecolumn(typeof(first(res)), n), res)
        -     end
        - end
        -
        - # function barrier works around type instability of groupreduce_init due to applicable
        - groupreduce(f, op, condf, adjust, checkempty::Bool,
        -             incol::AbstractVector, gd::GroupedDataFrame) =
        -     groupreduce!(groupreduce_init(op, condf, adjust, incol, gd),
        -                  f, op, condf, adjust, checkempty, incol, gd)
        - # Avoids the overhead due to Missing when computing reduction
        - groupreduce(f, op, condf::typeof(!ismissing), adjust, checkempty::Bool,
        -             incol::AbstractVector, gd::GroupedDataFrame) =
        -     groupreduce!(disallowmissing(groupreduce_init(op, condf, adjust, incol, gd)),
        -                  f, op, condf, adjust, checkempty, incol, gd)
        -
        - (r::Reduce)(incol::AbstractVector, gd::GroupedDataFrame) =
        -     groupreduce((x, i) -> x, r.op, r.condf, r.adjust, r.checkempty, incol, gd)
        -
        - # this definition is missing in Julia 1.0 LTS and is required by aggregation for var
        - # TODO: remove this when we drop 1.0 support
        - if VERSION < v"1.1"
        -     Base.zero(::Type{Missing}) = missing
        - end
        -
        - function (agg::Aggregate{typeof(var)})(incol::AbstractVector, gd::GroupedDataFrame)
        -     means = groupreduce((x, i) -> x, Base.add_sum, agg.condf, /, false, incol, gd)
        -     # !ismissing check is purely an optimization to avoid a copy later
        -     if eltype(means) >: Missing && agg.condf !== !ismissing
        -         T = Union{Missing, real(eltype(means))}
        -     else
        -         T = real(eltype(means))
        -     end
        -     res = zeros(T, length(gd))
        -     return groupreduce!(res, (x, i) -> @inbounds(abs2(x - means[i])), +, agg.condf,
        -                         (x, l) -> l <= 1 ? oftype(x / (l-1), NaN) : x / (l-1),
        -                         false, incol, gd)
        - end
        -
        - function (agg::Aggregate{typeof(std)})(incol::AbstractVector, gd::GroupedDataFrame)
        -     outcol = Aggregate(var, agg.condf)(incol, gd)
        -     if eltype(outcol) <: Union{Missing, Rational}
        -         return sqrt.(outcol)
        -     else
        -         return map!(sqrt, outcol, outcol)
        -     end
        - end
        -
        - for f in (first, last)
        -     function (agg::Aggregate{typeof(f)})(incol::AbstractVector, gd::GroupedDataFrame)
        -         n = length(gd)
        -         outcol = similar(incol, n)
        -         fillfirst!(agg.condf, outcol, incol, gd, rev=agg.f === last)
        -         if isconcretetype(eltype(outcol))
        -             return outcol
        -         else
        -             return copyto_widen!(Tables.allocatecolumn(typeof(first(outcol)), n), outcol)
        -         end
        -     end
        - end
        -
        - function (agg::Aggregate{typeof(length)})(incol::AbstractVector, gd::GroupedDataFrame)
        -     if getfield(gd, :idx) === nothing
        -         lens = zeros(Int, length(gd))
        -         @inbounds for gix in gd.groups
        -             gix > 0 && (lens[gix] += 1)
        -         end
        -         return lens
        -     else
        -         return gd.ends .- gd.starts .+ 1
        -     end
        - end
        -
        - isagg((col, fun)::Pair, gdf::GroupedDataFrame) =
        -     col isa ColumnIndex && check_aggregate(fun, parent(gdf)[!, col]) isa AbstractAggregate
        -
        - function _agg2idx_map_helper(idx, idx_agg)
        -     agg2idx_map = fill(-1, length(idx))
        -     aggj = 1
        -     @inbounds for (j, idxj) in enumerate(idx)
        -         while idx_agg[aggj] != idxj
        -             aggj += 1
        -             @assert aggj <= length(idx_agg)
        -         end
        -         agg2idx_map[j] = aggj
        -     end
        -     return agg2idx_map
        - end
        -
        - function prepare_idx_keeprows(idx::AbstractVector{<:Integer},
        -                               starts::AbstractVector{<:Integer},
        -                               ends::AbstractVector{<:Integer},
        -                               nrowparent::Integer)
        -     idx_keeprows = Vector{Int}(undef, nrowparent)
        -     i = 0
        -     for (s, e) in zip(starts, ends)
        -         v = idx[s]
        -         for k in s:e
        -             i += 1
        -             idx_keeprows[i] = v
        -         end
        -     end
        -     @assert i == nrowparent
        -     return idx_keeprows
        - end
        -
        - function _combine(f::AbstractVector{<:Pair},
        -                   gd::GroupedDataFrame, nms::AbstractVector{Symbol},
        -                   copycols::Bool, keeprows::Bool, renamecols::Bool)
        -     # here f should be normalized and in a form of source_cols => fun
        -     @assert all(x -> first(x) isa Union{Int, AbstractVector{Int}, AsTable}, f)
        -     @assert all(x -> last(x) isa Base.Callable, f)
        -
        -     if isempty(f)
        -         if keeprows && nrow(parent(gd)) > 0 && minimum(gd.groups) == 0
        -             throw(ArgumentError("select and transform do not support " *
        -                                 "`GroupedDataFrame`s from which some groups have "*
        -                                 "been dropped (including skipmissing=true)"))
        -         end
        -         return Int[], DataFrame()
        -     end
        -
        -     if keeprows
        -         if nrow(parent(gd)) > 0 && minimum(gd.groups) == 0
        -             throw(ArgumentError("select and transform do not support " *
        -                                 "`GroupedDataFrame`s from which some groups have "*
        -                                 "been dropped (including skipmissing=true)"))
        -         end
        -         idx_keeprows = prepare_idx_keeprows(gd.idx, gd.starts, gd.ends, nrow(parent(gd)))
        -     else
        -         idx_keeprows = nothing
        -     end
        -
        -     idx_agg = nothing
        -     if length(gd) > 0 && any(x -> isagg(x, gd), f)
        -         # Compute indices of representative rows only once for all AbstractAggregates
        -         idx_agg = Vector{Int}(undef, length(gd))
        -         fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
        -     elseif length(gd) == 0 || !all(x -> isagg(x, gd), f)
        -         # Trigger computation of indices
        -         # This can speed up some aggregates that would not trigger this on their own
        -         @assert gd.idx !== nothing
        -     end
        -     res = Vector{Any}(undef, length(f))
        -     parentdf = parent(gd)
        -     for (i, p) in enumerate(f)
        -         source_cols, fun = p
        -         if length(gd) > 0 && isagg(p, gd)
        -             incol = parentdf[!, source_cols]
        -             agg = check_aggregate(last(p), incol)
        -             outcol = agg(incol, gd)
        -             res[i] = idx_agg, outcol
        -         elseif keeprows && fun === identity && !(source_cols isa AsTable)
        -             @assert source_cols isa Union{Int, AbstractVector{Int}}
        -             @assert length(source_cols) == 1
        -             outcol = parentdf[!, first(source_cols)]
        -             res[i] = idx_keeprows, copycols ? copy(outcol) : outcol
        -         else
        -             if source_cols isa Int
        -                 incols = (parentdf[!, source_cols],)
        -             elseif source_cols isa AsTable
        -                 incols = Tables.columntable(select(parentdf,
        -                                                    source_cols.cols,
        -                                                    copycols=false))
        -             else
        -                 @assert source_cols isa AbstractVector{Int}
        -                 incols = ntuple(i -> parentdf[!, source_cols[i]], length(source_cols))
        -             end
        -             firstres = length(gd) > 0 ?
        -                        do_call(fun, gd.idx, gd.starts, gd.ends, gd, incols, 1) :
        -                        do_call(fun, Int[], 1:1, 0:0, gd, incols, 1)
        -             firstmulticol = firstres isa MULTI_COLS_TYPE
        -             if firstmulticol
        -                 throw(ArgumentError("a single value or vector result is required when " *
        -                                     "passing multiple functions (got $(typeof(res)))"))
        -             end
        -             # if idx_agg was not computed yet it is nothing
        -             # in this case if we are not passed a vector compute it.
        -             if !(firstres isa AbstractVector) && isnothing(idx_agg)
        -                 idx_agg = Vector{Int}(undef, length(gd))
        -                 fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
        -             end
        -             # TODO: if firstres is a vector we recompute idx for every function
        -             # this could be avoided - it could be computed only the first time
        -             # and later we could just check if lengths of groups match this first idx
        -
        -             # the last argument passed to _combine_with_first informs it about precomputed
        -             # idx. Currently we do it only for single-row return values otherwise we pass
        -             # nothing to signal that idx has to be computed in _combine_with_first
        -             idx, outcols, _ = _combine_with_first(wrap(firstres), fun, gd, incols,
        -                                                   Val(firstmulticol),
        -                                                   firstres isa AbstractVector ? nothing : idx_agg)
        -             @assert length(outcols) == 1
        -             res[i] = idx, outcols[1]
        -         end
        -     end
        -     # idx_agg === nothing then we have only functions that
        -     # returned multiple rows and idx_loc = 1
        -     idx_loc = findfirst(x -> x[1] !== idx_agg, res)
        -     if !keeprows && isnothing(idx_loc)
        -         @assert !isnothing(idx_agg)
        -         idx = idx_agg
        -     else
        -         idx = keeprows ? idx_keeprows : res[idx_loc][1]
        -         agg2idx_map = nothing
        -         for i in 1:length(res)
        -             if res[i][1] !== idx && res[i][1] != idx
        -                 if res[i][1] === idx_agg
        -                     # we perform pseudo broadcasting here
        -                     # keep -1 as a sentinel for errors
        -                     if isnothing(agg2idx_map)
        -                         agg2idx_map = _agg2idx_map_helper(idx, idx_agg)
        -                     end
        -                     res[i] = idx_agg, res[i][2][agg2idx_map]
        -                 elseif idx != res[i][1]
        -                     if keeprows
        -                         throw(ArgumentError("all functions must return vectors with " *
        -                                             "as many values as rows in each group"))
        -                     else
        -                         throw(ArgumentError("all functions must return vectors of the same length"))
        -                     end
        -                 end
        -             end
        -         end
        -     end
        -
        -     # here first field in res[i] is used to keep track how the column was generated
        -     # a correct index is stored in idx variable
        -
        -     for (i, (col_idx, col)) in enumerate(res)
        -         if keeprows && res[i][1] !== idx_keeprows # we need to reorder the column
        -             newcol = similar(col)
        -             # we can probably make it more efficient, but I leave it as an optimization for the future
        -             gd_idx = gd.idx
        -             for j in eachindex(gd.idx, col)
        -                 newcol[gd_idx[j]] = col[j]
        -             end
        -             res[i] = (col_idx, newcol)
        -         end
        -     end
        -     outcols = map(x -> x[2], res)
        -     # this check is redundant given we check idx above
        -     # but it is safer to double check and it is cheap
        -     @assert all(x -> length(x) == length(outcols[1]), outcols)
        -     return idx, DataFrame(collect(AbstractVector, outcols), nms, copycols=false)
        - end
        -
        - function _combine(fun::Base.Callable, gd::GroupedDataFrame, ::Nothing,
        -                   copycols::Bool, keeprows::Bool, renamecols::Bool)
        -     @assert copycols && !keeprows
        -     # use `similar` as `gd` might have been subsetted
        -     firstres = length(gd) > 0 ? fun(gd[1]) : fun(similar(parent(gd), 0))
        -     idx, outcols, nms = _combine_multicol(firstres, fun, gd, nothing)
        -     valscat = DataFrame(collect(AbstractVector, outcols), nms)
        -     return idx, valscat
        - end
        -
        - function _combine(p::Pair, gd::GroupedDataFrame, ::Nothing,
        -                   copycols::Bool, keeprows::Bool, renamecols::Bool)
        -     # here p should not be normalized as we allow tabular return value from fun
        -     # map and combine should not dispatch here if p is isagg
        0     @assert copycols && !keeprows
        0     source_cols, (fun, out_col) = normalize_selection(index(parent(gd)), p, renamecols)
        -     parentdf = parent(gd)
        -     if source_cols isa Int
        -         incols = (parent(gd)[!, source_cols],)
        -     elseif source_cols isa AsTable
        -         incols = Tables.columntable(select(parentdf,
        -                                            source_cols.cols,
        -                                            copycols=false))
        -     else
        -         @assert source_cols isa AbstractVector{Int}
        0         incols = ntuple(i -> parent(gd)[!, source_cols[i]], length(source_cols))
        -     end
       16     firstres = length(gd) > 0 ?
        -                do_call(fun, gd.idx, gd.starts, gd.ends, gd, incols, 1) :
        -                do_call(fun, Int[], 1:1, 0:0, gd, incols, 1)
       16     idx, outcols, nms = _combine_multicol(firstres, fun, gd, incols)
        -     # disallow passing target column name to genuine tables
        0     if firstres isa MULTI_COLS_TYPE
        0         if p isa Pair{<:Any, <:Pair{<:Any, <:SymbolOrString}}
        -             throw(ArgumentError("setting column name for tabular return value is disallowed"))
        -         end
        -     else
        -         # fetch auto generated or passed target column name to nms overwritting
        -         # what _combine_with_first produced
       96         nms = [out_col]
        -     end
       96     valscat = DataFrame(collect(AbstractVector, outcols), nms)
      208     return idx, valscat
        - end
        -
        - function _combine_multicol(firstres, fun::Any, gd::GroupedDataFrame,
        -                            incols::Union{Nothing, AbstractVector, Tuple, NamedTuple})
        -     firstmulticol = firstres isa MULTI_COLS_TYPE
      192     if !(firstres isa Union{AbstractVecOrMat, AbstractDataFrame,
        -                             NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
 50577616         idx_agg = Vector{Int}(undef, length(gd))
        0         fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
        -     else
        -         idx_agg = nothing
        -     end
        0     return _combine_with_first(wrap(firstres), fun, gd, incols,
        -                                Val(firstmulticol), idx_agg)
        - end
        -
        - function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
        -                              f::Any, gd::GroupedDataFrame,
        -                              incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
        -                              firstmulticol::Val, idx_agg::Union{Nothing, AbstractVector{<:Integer}})
       32     extrude = false
        -
        0     if first isa AbstractDataFrame
        -         n = 0
        0         eltys = eltype.(eachcol(first))
      256     elseif first isa NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}
        -         n = 0
        0         eltys = map(eltype, first)
        0     elseif first isa DataFrameRow
        0         n = length(gd)
        0         eltys = [eltype(parent(first)[!, i]) for i in parentcols(index(first))]
      240     elseif firstmulticol == Val(false) && first[1] isa Union{AbstractArray{<:Any, 0}, Ref}
        -         extrude = true
        0         first = wrap_row(first[1], firstmulticol)
        0         n = length(gd)
        0         eltys = (typeof(first[1]),)
        -     else # other NamedTuple giving a single row
        0         n = length(gd)
        0         eltys = map(typeof, first)
        0         if any(x -> x <: AbstractVector, eltys)
        0             throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
        -         end
        -     end
        0     idx = isnothing(idx_agg) ? Vector{Int}(undef, n) : idx_agg
        -     local initialcols
        -     let eltys=eltys, n=n # Workaround for julia#15276
      480         initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], n), _ncol(first))
        -     end
       16     targetcolnames = tuple(propertynames(first)...)
      288     if !extrude && first isa Union{AbstractDataFrame,
        -                                    NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}
        0         outcols, finalcolnames = _combine_tables_with_first!(first, initialcols, idx, 1, 1,
        -                                                              f, gd, incols, targetcolnames,
        -                                                              firstmulticol)
        -     else
       48         outcols, finalcolnames = _combine_rows_with_first!(first, initialcols, 1, 1,
        -                                                            f, gd, incols, targetcolnames,
        -                                                            firstmulticol)
        -     end
      272     return idx, outcols, collect(Symbol, finalcolnames)
        - end
        -
        - function fill_row!(row, outcols::NTuple{N, AbstractVector},
        -                    i::Integer, colstart::Integer,
        -                    colnames::NTuple{N, Symbol}) where N
        -     if _ncol(row) != N
        -         throw(ArgumentError("return value must have the same number of columns " *
        -                             "for all groups (got $N and $(length(row)))"))
        -     end
        0     @inbounds for j in colstart:length(outcols)
        0         col = outcols[j]
        0         cn = colnames[j]
        -         local val
        -         try
202306672             val = row[cn]
        -         catch
        0             throw(ArgumentError("return value must have the same column names " *
        -                                 "for all groups (got $colnames and $(propertynames(row)))"))
        -         end
        -         S = typeof(val)
        -         T = eltype(col)
        -         if S <: T || promote_type(S, T) <: T
        0             col[i] = val
        -         else
        0             return j
        -         end
        -     end
        0     return nothing
        - end
        -
        - function _combine_rows_with_first!(first::Union{NamedTuple, DataFrameRow},
        -                                    outcols::NTuple{N, AbstractVector},
        -                                    rowstart::Integer, colstart::Integer,
        -                                    f::Any, gd::GroupedDataFrame,
        -                                    incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
        -                                    colnames::NTuple{N, Symbol},
        -                                    firstmulticol::Val) where N
        0     len = length(gd)
        0     gdidx = gd.idx
        0     starts = gd.starts
        0     ends = gd.ends
        -
        -     # handle empty GroupedDataFrame
        0     len == 0 && return outcols, colnames
        -
        -     # Handle first group
        0     j = fill_row!(first, outcols, rowstart, colstart, colnames)
        -     @assert j === nothing # eltype is guaranteed to match
        -     # Handle remaining groups
        0     @inbounds for i in rowstart+1:len
404608368         row = wrap_row(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
303456672         j = fill_row!(row, outcols, i, 1, colnames)
        0         if j !== nothing # Need to widen column type
        -             local newcols
        0             let i = i, j = j, outcols=outcols, row=row # Workaround for julia#15276
        0                 newcols = ntuple(length(outcols)) do k
        -                     S = typeof(row[k])
        -                     T = eltype(outcols[k])
        -                     U = promote_type(S, T)
        -                     if S <: T || U <: T
        -                         outcols[k]
        -                     else
        -                         copyto!(Tables.allocatecolumn(U, length(outcols[k])),
        -                                 1, outcols[k], 1, k >= j ? i-1 : i)
        -                     end
        -                 end
        -             end
        0             return _combine_rows_with_first!(row, newcols, i, j,
        -                                              f, gd, incols, colnames, firstmulticol)
        -         end
        -     end
       32     return outcols, colnames
        - end
        -
        - # This needs to be in a separate function
        - # to work around a crash due to JuliaLang/julia#29430
        - if VERSION >= v"1.1.0-DEV.723"
        -     @inline function do_append!(do_it, col, vals)
        -         do_it && append!(col, vals)
        -         return do_it
        -     end
        - else
        -     @noinline function do_append!(do_it, col, vals)
        -         do_it && append!(col, vals)
        -         return do_it
        -     end
        - end
        -
        - function append_rows!(rows, outcols::NTuple{N, AbstractVector},
        -                       colstart::Integer, colnames::NTuple{N, Symbol}) where N
        -     if !isa(rows, Union{AbstractDataFrame, NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
        -         throw(ArgumentError(ERROR_ROW_COUNT))
        -     elseif _ncol(rows) != N
        -         throw(ArgumentError("return value must have the same number of columns " *
        -                             "for all groups (got $N and $(_ncol(rows)))"))
        -     end
        -     @inbounds for j in colstart:length(outcols)
        -         col = outcols[j]
        -         cn = colnames[j]
        -         local vals
        -         try
        -             vals = getproperty(rows, cn)
        -         catch
        -             throw(ArgumentError("return value must have the same column names " *
        -                                 "for all groups (got $colnames and $(propertynames(rows)))"))
        -         end
        -         S = eltype(vals)
        -         T = eltype(col)
        -         if !do_append!(S <: T || promote_type(S, T) <: T, col, vals)
        -             return j
        -         end
        -     end
        -     return nothing
        - end
        -
        - function _combine_tables_with_first!(first::Union{AbstractDataFrame,
        -                                      NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}},
        -                                      outcols::NTuple{N, AbstractVector},
        -                                      idx::Vector{Int}, rowstart::Integer, colstart::Integer,
        -                                      f::Any, gd::GroupedDataFrame,
        -                                      incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
        -                                      colnames::NTuple{N, Symbol},
        -                                      firstmulticol::Val) where N
        -     len = length(gd)
        -     gdidx = gd.idx
        -     starts = gd.starts
        -     ends = gd.ends
        -     # Handle first group
        -
        -     @assert _ncol(first) == N
        -     if !isempty(colnames) && length(gd) > 0
        -         j = append_rows!(first, outcols, colstart, colnames)
        -         @assert j === nothing # eltype is guaranteed to match
        -         append!(idx, Iterators.repeated(gdidx[starts[rowstart]], _nrow(first)))
        -     end
        -     # Handle remaining groups
        -     @inbounds for i in rowstart+1:len
        -         rows = wrap_table(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
        -         _ncol(rows) == 0 && continue
        -         if isempty(colnames)
        -             newcolnames = tuple(propertynames(rows)...)
        -             if rows isa AbstractDataFrame
        -                 eltys = eltype.(eachcol(rows))
        -             else
        -                 eltys = map(eltype, rows)
        -             end
        -             initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], 0), _ncol(rows))
        -             return _combine_tables_with_first!(rows, initialcols, idx, i, 1,
        -                                                f, gd, incols, newcolnames, firstmulticol)
        -         end
        -         j = append_rows!(rows, outcols, 1, colnames)
        -         if j !== nothing # Need to widen column type
        -             local newcols
        -             let i = i, j = j, outcols=outcols, rows=rows # Workaround for julia#15276
        -                 newcols = ntuple(length(outcols)) do k
        -                     S = eltype(rows isa AbstractDataFrame ? rows[!, k] : rows[k])
        -                     T = eltype(outcols[k])
        -                     U = promote_type(S, T)
        -                     if S <: T || U <: T
        -                         outcols[k]
        -                     else
        -                         copyto!(Tables.allocatecolumn(U, length(outcols[k])), outcols[k])
        -                     end
        -                 end
        -             end
        -             return _combine_tables_with_first!(rows, newcols, idx, i, j,
        -                                                f, gd, incols, colnames, firstmulticol)
        -         end
        -         append!(idx, Iterators.repeated(gdidx[starts[i]], _nrow(rows)))
        -     end
        -     return outcols, colnames
        - end
        -
        - """
        -     select(gd::GroupedDataFrame, args...; copycols::Bool=true, keepkeys::Bool=true,
        -            ungroup::Bool=true, renamecols::Bool=true)
        -
        - Apply `args` to `gd` following the rules described in [`combine`](@ref).
        -
        - If `ungroup=true` the result is a `DataFrame`.
        - If  `ungroup=false` the result is a `GroupedDataFrame`
        - (in this case the returned value retains the order of groups of `gd`).
        -
        - The `parent` of the returned value has as many rows as `parent(gd)` and
        - in the same order, except when the returned value has no columns
        - (in which case it has zero rows). If an operation in `args` returns
        - a single value it is always broadcasted to have this number of rows.
        -
        - If `copycols=false` then do not perform copying of columns that are not transformed.
        -
        - $KWARG_PROCESSING_RULES
        -
        - # See also
        -
        - [`groupby`](@ref), [`combine`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
        -
        - # Examples
        - ```jldoctest
        - julia> df = DataFrame(a = [1, 1, 1, 2, 2, 1, 1, 2],
        -                       b = repeat([2, 1], outer=[4]),
        -                       c = 1:8)
        - 8×3 DataFrame
        - │ Row │ a     │ b     │ c     │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │
        - │ 2   │ 1     │ 1     │ 2     │
        - │ 3   │ 1     │ 2     │ 3     │
        - │ 4   │ 2     │ 1     │ 4     │
        - │ 5   │ 2     │ 2     │ 5     │
        - │ 6   │ 1     │ 1     │ 6     │
        - │ 7   │ 1     │ 2     │ 7     │
        - │ 8   │ 2     │ 1     │ 8     │
        -
        - julia> gd = groupby(df, :a);
        -
        - julia> select(gd, :c => sum, nrow)
        - 8×3 DataFrame
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 19    │ 5     │
        - │ 2   │ 1     │ 19    │ 5     │
        - │ 3   │ 1     │ 19    │ 5     │
        - │ 4   │ 2     │ 17    │ 3     │
        - │ 5   │ 2     │ 17    │ 3     │
        - │ 6   │ 1     │ 19    │ 5     │
        - │ 7   │ 1     │ 19    │ 5     │
        - │ 8   │ 2     │ 17    │ 3     │
        -
        - julia> select(gd, :c => sum, nrow, ungroup=false)
        - GroupedDataFrame with 2 groups based on key: a
        - First Group (5 rows): a = 1
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 19    │ 5     │
        - │ 2   │ 1     │ 19    │ 5     │
        - │ 3   │ 1     │ 19    │ 5     │
        - │ 4   │ 1     │ 19    │ 5     │
        - │ 5   │ 1     │ 19    │ 5     │
        - ⋮
        - Last Group (3 rows): a = 2
        - │ Row │ a     │ c_sum │ nrow  │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 2     │ 17    │ 3     │
        - │ 2   │ 2     │ 17    │ 3     │
        - │ 3   │ 2     │ 17    │ 3     │
        -
        - julia> select(gd, :c => (x -> sum(log, x)) => :sum_log_c) # specifying a name for target column
        - 8×2 DataFrame
        - │ Row │ a     │ sum_log_c │
        - │     │ Int64 │ Float64   │
        - ├─────┼───────┼───────────┤
        - │ 1   │ 1     │ 5.52943   │
        - │ 2   │ 1     │ 5.52943   │
        - │ 3   │ 1     │ 5.52943   │
        - │ 4   │ 2     │ 5.07517   │
        - │ 5   │ 2     │ 5.07517   │
        - │ 6   │ 1     │ 5.52943   │
        - │ 7   │ 1     │ 5.52943   │
        - │ 8   │ 2     │ 5.07517   │
        -
        - julia> select(gd, [:b, :c] .=> sum) # passing a vector of pairs
        - 8×3 DataFrame
        - │ Row │ a     │ b_sum │ c_sum │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 8     │ 19    │
        - │ 2   │ 1     │ 8     │ 19    │
        - │ 3   │ 1     │ 8     │ 19    │
        - │ 4   │ 2     │ 4     │ 17    │
        - │ 5   │ 2     │ 4     │ 17    │
        - │ 6   │ 1     │ 8     │ 19    │
        - │ 7   │ 1     │ 8     │ 19    │
        - │ 8   │ 2     │ 4     │ 17    │
        -
        - julia> select(gd, :b => :b1, :c => :c1,
        -               [:b, :c] => +, keepkeys=false) # multiple arguments, renaming and keepkeys
        - 8×3 DataFrame
        - │ Row │ b1    │ c1    │ b_c_+ │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 2     │ 1     │ 3     │
        - │ 2   │ 1     │ 2     │ 3     │
        - │ 3   │ 2     │ 3     │ 5     │
        - │ 4   │ 1     │ 4     │ 5     │
        - │ 5   │ 2     │ 5     │ 7     │
        - │ 6   │ 1     │ 6     │ 7     │
        - │ 7   │ 2     │ 7     │ 9     │
        - │ 8   │ 1     │ 8     │ 9     │
        -
        - julia> select(gd, :b, :c => sum) # passing columns and broadcasting
        - 8×3 DataFrame
        - │ Row │ a     │ b     │ c_sum │
        - │     │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 19    │
        - │ 2   │ 1     │ 1     │ 19    │
        - │ 3   │ 1     │ 2     │ 19    │
        - │ 4   │ 2     │ 1     │ 17    │
        - │ 5   │ 2     │ 2     │ 17    │
        - │ 6   │ 1     │ 1     │ 19    │
        - │ 7   │ 1     │ 2     │ 19    │
        - │ 8   │ 2     │ 1     │ 17    │
        -
        - julia> select(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
        - 8×4 DataFrame
        - │ Row │ a     │ b     │ c     │ b_c   │
        - │     │ Int64 │ Int64 │ Int64 │ Int64 │
        - ├─────┼───────┼───────┼───────┼───────┤
        - │ 1   │ 1     │ 2     │ 1     │ 3     │
        - │ 2   │ 1     │ 1     │ 2     │ 3     │
        - │ 3   │ 1     │ 2     │ 3     │ 5     │
        - │ 4   │ 2     │ 1     │ 4     │ 5     │
        - │ 5   │ 2     │ 2     │ 5     │ 7     │
        - │ 6   │ 1     │ 1     │ 6     │ 7     │
        - │ 7   │ 1     │ 2     │ 7     │ 9     │
        - │ 8   │ 2     │ 1     │ 8     │ 9     │
        - ```
        - """
        - select(gd::GroupedDataFrame, args...; copycols::Bool=true, keepkeys::Bool=true,
        -        ungroup::Bool=true, renamecols::Bool=true) =
        -     _combine_prepare(gd, args..., copycols=copycols, keepkeys=keepkeys,
        -                      ungroup=ungroup, keeprows=true, renamecols=renamecols)
        -
        - """
        -     transform(gd::GroupedDataFrame, args...;
        -               copycols::Bool=true, keepkeys::Bool=true, ungroup::Bool=true)
        -
        - An equivalent of
        - `select(gd, :, args..., copycols=copycols, keepkeys=keepkeys, ungroup=ungroup, renamecols=renamecols)`
        - but keeps the columns of `parent(gd)` in their original order.
        -
        - # See also
        -
        - [`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform!`](@ref)
        - """
        - function transform(gd::GroupedDataFrame, args...; copycols::Bool=true,
        -                    keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
        -     res = select(gd, :, args..., copycols=copycols, keepkeys=keepkeys,
        -                  ungroup=ungroup, renamecols=renamecols)
        -     # res can be a GroupedDataFrame based on DataFrame or a DataFrame,
        -     # so parent always gives a data frame
        -     select!(parent(res), propertynames(parent(gd)), :)
        -     return res
        - end
        -
        - """
        -     select!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
        -
        - An equivalent of
        - `select(gd, args..., copycols=false, keepkeys=true, ungroup=ungroup, renamecols=renamecols)`
        - but updates `parent(gd)` in place.
        -
        - `gd` is updated to reflect the new rows of its updated parent.
        - If there are independent `GroupedDataFrame` objects constructed
        - using the same parent data frame they might get corrupt.
        -
        - # See also
        -
        - [`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`transform`](@ref), [`transform!`](@ref)
        - """
        - function select!(gd::GroupedDataFrame{DataFrame}, args...;
        -                  ungroup::Bool=true, renamecols::Bool=true)
        -     newdf = select(gd, args..., copycols=false, renamecols=renamecols)
        -     df = parent(gd)
        -     _replace_columns!(df, newdf)
        -     return ungroup ? df : gd
        - end
        -
        - """
        -     transform!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
        -
        - An equivalent of
        - `transform(gd, args..., copycols=false, keepkeys=true, ungroup=ungroup, renamecols=renamecols)`
        - but updates `parent(gd)` in place
        - and keeps the columns of `parent(gd)` in their original order.
        -
        - # See also
        -
        - [`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref)
        - """
        - function transform!(gd::GroupedDataFrame{DataFrame}, args...;
        -                     ungroup::Bool=true, renamecols::Bool=true)
        -     newdf = select(gd, :, args..., copycols=false, renamecols=renamecols)
        -     df = parent(gd)
        -     select!(newdf, propertynames(df), :)
        -     _replace_columns!(df, newdf)
        -     return ungroup ? df : gd
        - end