Moelf/index.md

## index.md

      
    Raw
  

              index.md
            
          
    This in principle allows one to read anything that uproot/awkward can read and represent (as long as to_arrow worked):
We use the following packages to demonstrate our round trip
julia> using PythonCall

julia> const ak = pyimport("awkward")

julia> ak.__version__
Python str: '1.9.0rc10'

julia> const pa = pyimport("pyarrow");
First, let's make some non-trivial data to represent:
julia> arr = ak._v2.from_iter([pydict(("one"=>1, "two"=>[2.0])), pydict(("one"=>2, "two"=>[1.0, 2.0]))])
Python Array: <Array [{one: 1, two: [2]}, {...}] type='2 * {one: int64, two: var * float64}'>

julia> arr.one
Python Array: <Array [1, 2] type='2 * int64'>
One can almost always get a pyarrow table out of awkward array:
julia> pa_table = ak._v2.to_arrow_table(arr)
Python Table:
pyarrow.Table
one: extension<awkward<AwkwardArrowType>> not null
two: extension<awkward<AwkwardArrowType>> not null
----
one: [[1,2]]
two: [[[2],[1,2]]]

julia> pa_batches = pa_table.to_batches()
Python list:
[pyarrow.RecordBatch
one: extension<awkward<AwkwardArrowType>> not null
two: extension<awkward<AwkwardArrowType>> not null]
There's always only one batch due to how awkward does this thing:
https://github.com/scikit-hep/awkward/blob/dd2a3f400e29fc9ea908fc7d8267f592091457bb/src/awkward/operations/convert.py#L2590
julia> batch = only(pa_batches)
Python RecordBatch:
pyarrow.RecordBatch
one: extension<awkward<AwkwardArrowType>> not null
two: extension<awkward<AwkwardArrowType>> not null

julia> batch.num_rows
Python int: 2

julia> batch.num_columns
Python int: 2
Here's the important bit
We can write whole block of IPC stream bytes into a Julia buffer, and Arrow.jl can re-use that memory blob and turn it into a table:
julia> jl_sink = IOBuffer()

julia> pywith(pa.ipc.new_stream(jl_sink, batch.schema)) do writer
               writer.write_batch(batch)
           end;

julia> DataFrame(Arrow.Table(take!(jl_sink)))
2×2 DataFrame
 Row │ one    two        
     │ Int64  Array…     
─────┼───────────────────
   1 │     1  [2.0]
   2 │     2  [1.0, 2.0]