Skip to content

Instantly share code, notes, and snippets.

# in a fresh conda environment, >= py3.8
conda install xrootd -c conda-forge
# install dask-awkward@main, install dask-histogram from https://github.com/lgray/dask-histogram/tree/map_reduce_agg_hist_adds
pip install coffea xgboost mt2 distributed==2024.2.0 dask==2024.2.0
git clone https://github.com/TopEFT/topcoffea.git -b coffea2023
pushd topcoffea
pip install -e .
popd
@lgray
lgray / ewkcoffea_setup.txt
Last active February 12, 2024 22:04
instructions for slow analysis example
# in a fresh conda environment, >= py3.8
conda install xrootd -c conda-forge
pip install coffea xgboost mt2
git clone https://github.com/TopEFT/topcoffea.git -b coffea2023
pushd topcoffea
pip install -e .
popd
git clone https://github.com/cmstas/ewkcoffea.git -b coffea2023
@lgray
lgray / profile.txt
Created January 10, 2024 13:43
profile of running dak.necessary_columns on a wwz analysis
Running necessary_columns...
_ ._ __/__ _ _ _ _ _/_ Recorded: 07:32:07 Samples: 154011
/_//_/// /_\ / //_// / //_'/ // Duration: 164.460 CPU time: 164.503
/ _/ v4.6.1
Program: run_wwz4l.py ../../input_samples/sample_jsons/test_samples/UL17_WWZJetsTo4L2Nu_forCI.json,../../input_samples/sample_jsons/test_samples/UL17_WWZJetsTo4L2Nu_forCI_extra.json -x iterative
164.462 <module> run_wwz4l.py:1
└─ 164.462 report_necessary_columns dask_awkward/lib/inspect.py:118
@lgray
lgray / example.py
Created January 8, 2024 20:53
client.compute() example for coffea
with Client() as client: # distributed Client scheduler
# Run preprocess
print("\nRunning preprocess...")
dataset_runnable, dataset_updated = preprocess(
fileset,
maybe_step_size=50_000,
align_clusters=False,
files_per_batch=1,
#skip_bad_files=True,
@lgray
lgray / muon_beam_decay_sim.py
Last active November 13, 2023 17:23
decay of 1 TeV muon beam, resulting neutrino and electron kinematics
import numpy as np
from scipy.stats import uniform
import vector
import hist
from math import pi
def make_vector(rawvec):
return vector.arr({"px": rawvec[:, 0], "py": rawvec[:, 1], "pz": rawvec[:, 2], "M": rawvec[:, 3]})
P_beam = 1000 # GeV
@lgray
lgray / batcher_class.py
Last active July 12, 2023 21:17
data batcher for smartpixels samples
class CustomDataGenerator(tf.keras.utils.Sequence):
def __init__(self,
data_directory_path: str = "./",
labels_directory_path: str = "./",
is_directory_recursive: bool = False,
file_type: str = "csv",
data_format: str = "2D",
batch_size: int = 32,
file_count = None,
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
from coffea.processor import accumulate
from distributed import Client
import dask
import dask_awkward as dak
import dask.array
from dask.diagnostics import ProgressBar
import awkward
# import warnings
# warnings.filterwarnings("error")
import logging
import os
import time
from coffea import processor
from coffea.nanoevents import NanoAODSchema
@lgray
lgray / jec_dropped_column.py
Created March 3, 2023 17:44
fails with a dropped column in column optimization (particularly the broadcasted rho column!)
import time
import awkward as ak
import dask_awkward as dak
import numpy as np
import os
from coffea.lookup_tools import extractor
from coffea.jetmet_tools import FactorizedJetCorrector
@lgray
lgray / spark_works_again.txt
Created December 9, 2022 22:47
nanoaod in spark table
(coffea-dev) lgray@dhcp-131-225-97-134 coffea % python -i spark_work.py
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/12/09 16:42:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
40 * {Muon_pt: var * float32, Muon_eta: var * float32, Muon_phi: var * float32, Muon_mass: var * float32, Muon_charge: var * int32, nMuon: int64}
[pyarrow.RecordBatch
Muon_pt: list<item: float not null> not null
child 0, item: float not null
Muon_eta: list<item: float not null> not null
child 0, item: float not null