Skip to content

Instantly share code, notes, and snippets.

@aminnj
Created July 28, 2021 01:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aminnj/79ceeb7ab9026f80e3cc1978124f8370 to your computer and use it in GitHub Desktop.
Save aminnj/79ceeb7ab9026f80e3cc1978124f8370 to your computer and use it in GitHub Desktop.
ROOT tree from julia to python

Since writing out ROOT files is currently not possible with UnROOT, one can write out Arrow files directly from an UnROOT.LazyTree object which can be read back in julia. With some consideration of the chunking, this won't use much memory.

using UnROOT
using Arrow
using Tables

treename = "Events"
filename = "18BCCE71-15B8-194B-8738-EC993C8DD3BD.root"
branches = [r"^MET_(pt|phi)$","Jet_pt","Jet_eta","Muon_pt"]

const f = ROOTFile(filename)
const t = LazyTree(f, treename, branches)

# `Arrow.write` determines batch size by `Tables.partitions()`
# By default, it is
#   Tables.partitions(t::LazyTree) = (t,)
# which writes out the whole table at once.
# We often cannot hold large materialized tables in memory.
# For NanoAOD, the tree has fClusterRangeEnd defined, which is
# essentially the aligned basket entry ranges. For other kinds of
# trees it may be necessary to change the chunking logic here
function Tables.partitions(t::LazyTree)
    tree = f[treename]
    edges = [0, (tree.fClusterRangeEnd .+ 1)..., tree.fEntries]
    ranges = [(edges[i]+1):edges[i+1] for i in 1:(length(edges)-1)]
    return (t[r] for r in ranges)
end

Arrow.write("out.arrow", t, compress=:lz4, ntasks=1)

And it can be read back in Python to the awkward ecosystem with the pyarrow package. Remember to iterate over the f.num_record_batches batches.

>>> import awkward1 as ak
>>> import pyarrow
>>> f = pyarrow.open_file("out.arrow")
>>> f.num_record_batches
100
>>> batch = f.get_batch(0) # first batch
>>> ak.from_arrow(batch)[0] # first event
<Record ... 0.708, -2.77], Muon_pt: [64.7]} type='{"MET_phi": float32, "MET_pt":...'>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment