Skip to content

Instantly share code, notes, and snippets.

@davidlange6
Last active July 17, 2020 13:08
Show Gist options
  • Save davidlange6/3b0cb365aac669a714d9f288b0bf420d to your computer and use it in GitHub Desktop.
Save davidlange6/3b0cb365aac669a714d9f288b0bf420d to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# usage
# edm_pset_tweak.py --input_pkl RunPromptRecoCfg.pkl --output_pkl ouput.pkl --json tweaks.json
#
# tweaks.json can be
# [{ "process.maxEvents.input" : 100}]
# or
# { "process.maxEvents.input" : 100}
# or
# [{ "maxEvents.input" : 100}]
# or
# { "maxEvents.input" : 100}
#
# properly handles multiple pkls and mulitple jsons (all jsons are applied to all pkls)
#
import pickle
import argparse
import sys
import json
def apply_tweak(process, key, value):
key_split = key.split('.')
param = process
if key_split[0] == "process":
key_split = key_split[1:]
for p in key_split:
param = getattr(param, p)
if param is None:
return 1
param = value
return 0
def init_argparse():
parser = argparse.ArgumentParser(
usage="%(prog)s [OPTION] [FILE]...",
description="Tweak pset(s) based on json list of tweaks to apply and write out resulting pset(s)"
)
parser.add_argument('--json', nargs='+', required=True)
parser.add_argument('--input_pkl', nargs='+', required=True)
parser.add_argument('--output_pkl', nargs='+', required=False)
parser.add_argument('--allow_failed_tweaks', action='store_true')
return parser
def main():
parser = init_argparse()
args = parser.parse_args()
if len(args.output_pkl) != 0:
output_file_names = args.output_pkl
if len(args.input_pkl) != len(args.output_pkl):
print(
"Incorrect arguments. --input_pkl and --output_pkl must be the same length")
sys.exit(1)
else:
output_file_names=args.input_pkl
tweaks = []
for json_file_name in args.json:
try:
with open(json_file_name) as json_file:
json_data = json.load(json_file)
except Exception as e:
print("Error opening file "+json_file_name)
sys.exit(1)
if not isinstance(json_data,(dict,list)):
print("Error loading json "+json_file_name)
sys.exit(1)
if isinstance(json_data, list):
for d in json_data:
for key in d:
tweaks.append((key, d[key]))
elif isinstance(json_data, dict):
for key in json_data:
tweaks.append((key, json_data[key]))
else:
print("Error loading json "+json_file_name)
sys.exit(1)
for i in range(len(args.input_pkl)):
with open(args.input_pkl[i], "rb") as pkl_file:
try:
process = pickle.load(pkl_file)
except (IOError, OSError, pickle.PickleError, pickle.UnpicklingError):
print("Not a valid pickle file " +
args.input_pkl[i]) + "- stopping"
sys.exit(1)
for tweak in tweaks:
err_val = apply_tweak(process, tweak[0], tweak[1])
if err_val != 0:
print("Tweak not applied "+tweak[0]+" "+str(tweak[1]))
if not args.allow_failed_tweaks:
sys.exit(1)
else:
print("Tweak applied "+tweak[0]+" "+str(tweak[1]))
with open(output_file_names[i], "wb") as output_file:
if output_file.closed:
print("Error loading pickle input "+args.output_pkl[i])
sys.exit(1)
pickle.dump(process, output_file, protocol=0)
main()
@justinasr
Copy link

try:
    process = pickle.load(in_f)
except (IOError, OSError, pickle.PickleError, pickle.UnpicklingError):
    print("Not a valid pickle file " + args.input_pkl[i])

This is a dangerous piece of code. If the exception is raised during the first iteration, then process will be undefined and code will crash in next few lines, however if exception is raised during not-first iteration, process will not be overwritten and code will go on with process from previous iteration. Either there is a sys.exit(1) or continue missing in except: block, or maybe removing try: except: and letting script crash is the safer option?


if type(data) == dict:
     for key in data:
         tweaks.append( ( key, d[key]) )

d is undefined here. Only time it would be defined is some previous JSONs were lists and for d in data: happened. I believe d has to be changed to data. I would also change if to elif here just because data will never be both list and dict.


in_f = open(args.input_pkl[i],"rb")
if in_f.closed:

How could a file be closed after opening it? open() documentation states that

If the file cannot be opened, an OSError is raised.

Or IOError in python 2. Maybe all these .closed checks are redundant?


parser.add_argument('--json', nargs='*')
parser.add_argument('--input_pkl', nargs='*')
parser.add_argument('--output_pkl', nargs='*')

I believe this can be updated to:

parser.add_argument('--json', nargs='+', required=True)
parser.add_argument('--input_pkl', nargs='+', required=True)
parser.add_argument('--output_pkl', nargs='+', required=True)

Where all three arguments are required AND required to have at least one value (nargs='+')
This then could be simplified:

if not args.input_pkl or not args.output_pkl or len(args.input_pkl) != len(args.output_pkl):
    print("incorrect arguments. Provide --input_pkl and --output_pkl")
    sys.exit(1)
if not args.json:
    print("incorrect arguments. Provide --json")
    sys.exit(1)

to:

if len(args.input_pkl) != len(args.output_pkl):
    print("Incorrect arguments. --input_pkl and --output_pkl must be the same length")
    sys.exit(1)

There are some other small bits and bobs such as using isinstance(...) instead of type(..), using with statement to open files (it calls close() automatically), consistent spacing (according to PEP8), proper variable names (according to PEP8), but this would be nitpicking.

@amaltaro
Copy link

David, I did not check the whole code yet, but I would suggest to open these pickle files with a context manager, to make sure that whatever is open, will certainly get closed. Something like:

with open("my pickle file", "r") as pObj:
  data = pickle.load(pObj)
  etc

in the apply_tweak function. I think it would be useful to record (log) every single parameter that is getting updated. Of course, eventually one can use the logging module and properly play with logging level.

@davidlange6
Copy link
Author

it sounds like that since I get technical comments that indeed this is the needed functionality to replace the pset tweaks in WMCore and that the interface is ok. We'll make a new version with bugs fixed and other improvements.

@amaltaro
Copy link

David, yes, the interface looks okay and hopefully that's all that we need for tweaking the pset configuration.
I have a couple more comments though:

  • in central production, we are supposed to invoke this tweak once per cmsRun process. For StepChain workflows, we would invoke the pset tweak multiple times during runtime, but still once per cmsRun step (cmsRun1, cmsRun2, etc).
  • I'd make the output pickle parameter optional, if not provided, overwrite the input
  • the json input tweak data probably needs to have a schema defined. Like, either we make it a list of smaller python dicts (meant to have a smaller memory footprint):
[{"param1": value1}, {"param2": value2}, etc etc

or we make it a flat dict:

{"param1": value1, "param2": value2, etc}

Just a few more ideas before I forget it.

@davidlange6
Copy link
Author

both

[{"param1": value1}, {"param2": value2}, etc etc

and

{"param1": value1, "param2": value2, etc}

are supported.

@davidlange6
Copy link
Author

'''
In central production, we are supposed to invoke this tweak once per cmsRun process. For StepChain workflows, we would invoke the pset tweak multiple times during runtime, but still once per cmsRun step (cmsRun1, cmsRun2, etc).
'''

is there some action needed by the script? It doesn't sound like it.

@davidlange6
Copy link
Author

I revised the tool given comments above. I didn't yet consider the case where --output_pkl is not specified. I think that is easy to do.

@davidlange6
Copy link
Author

updated for --output_pkl and isinstance
(still mostly untested)

@justinasr
Copy link

I believe there is a little mistake here:

if isinstance(json_data,list) and isinstance(json_data,dict):
    print("Error loading json "+json_file_name)
    sys.exit(1)

It should probably be (the negation):

if not isinstance(json_data, (list, dict)):
    print("Error loading json " + json_file_name)
    sys.exit(1)

Or even if you bring back elif isinstance(json_data, dict), error can be moved to else:

if isinstance(json_data, list):
    for d in json_data:
        for key in d:
            tweaks.append((key, d[key]))
elif isinstance(json_data, dict):
    for key in json_data:
        tweaks.append((key, json_data[key]))
else:
    print("Error loading json "+json_file_name)
    sys.exit(1)

But this is not necessary, just might save one isinstance.

@davidlange6
Copy link
Author

definitely a bug.. I'll take your version

@davidlange6
Copy link
Author

now to write some tests for all the different cases..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment