Skip to content

Instantly share code, notes, and snippets.

@jotelha
Last active August 30, 2019 17:41
Show Gist options
  • Save jotelha/0f1b9b17157081b5daaa5df8106f7e9a to your computer and use it in GitHub Desktop.
Save jotelha/0f1b9b17157081b5daaa5df8106f7e9a to your computer and use it in GitHub Desktop.
Fireworks knowledge

FW id 17540

  - _fw_name: PyTask
    func:    glob.glob
    args:    ["frame*.lammps"]
    outputs: [ frame_file_list ]

  - _fw_name: ForeachTask
    split: frame_file_list
    task:
      _fw_name: CommandLineTask
      command_spec:
        command:         [ echo , -n , "Test for " ]
        frame_file_list: { source: frame_file_list }
        test_echo_list:  { source: { type: stdout }, target: test_echo_list }
      inputs:  [ frame_file_list ]
      outputs: [ test_echo_list ]

leads to

- "update_spec": {

    "_files_prev": { },
    -
    "frame_file_list": [
        "frame_00035000.lammps",
        "frame_00000000.lammps",
        "frame_00105000.lammps",
        "frame_00140000.lammps",
        "frame_00070000.lammps"
    ]
  }

but detours (i.e 17545) fail with

Traceback (most recent call last):
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/core/rocket.py", line 262, in run
    m_action = t.run_task(my_spec)
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/user_objects/firetasks/dataflow_tasks.py", line 106, in run_task
    inp[key] = fw_spec[item]
KeyError: 'test_echo_list'

Trial 2

name: mock trial frame extraction
spec:
  _category: nemo_noqueue
  _tasks:
  - _fw_name: CommandLineTask
    command_spec:
      command: [ touch, frame_01.lammps, frame_02.lammps, frame_03.lammps ]

  # create list of all output files
  - _fw_name: PyTask
    func:    glob.glob
    args:    ["frame*.lammps"]
    outputs: [ frame_file_list ]

  - _fw_name: ForeachTask
    split: frame_file_list
    task:
      _fw_name: CommandLineTask
      command_spec:
        command:         [ echo , -n ]
        frame_file_list: { source: frame_file_list }
        test_echo_list:  
          source: { type: stdout }
          target: { type: data }
      inputs:  [ frame_file_list ]
      outputs: [ test_echo_list ]
_stacktrace": "Traceback (most recent call last):
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/core/rocket.py", line 262, in run
    m_action = t.run_task(my_spec)
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/user_objects/firetasks/dataflow_tasks.py", line 114, in run_task
    outlist = self.command_line_tool(command, inputs, outputs)
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/user_objects/firetasks/dataflow_tasks.py", line 184, in command_line_tool
    assert (arg['source']['type'] is not None
TypeError: list indices must be integers or slices, not str

Trial 3

name: mock trial frame extraction
spec:
  _category: nemo_noqueue
  _tasks:
  - _fw_name: CommandLineTask
    command_spec:
      command: [ touch, frame_01.lammps, frame_02.lammps, frame_03.lammps ]

  # create list of all output files
  - _fw_name: PyTask
    func:    glob.glob
    args:    ["frame*.lammps"]
    outputs: [ frame_file_list ]

  - _fw_name: ForeachTask
    split: frame_file_list
    task:
      _fw_name: CommandLineTask
      command_spec:
        command:         [ echo , -n ]
        frame_file_list: { source: frame_file_list }
        test_echo_list:  
          source: { type: stdout }
          target: { type: data, value: test_echo_list }
      inputs:  [ frame_file_list ]
      outputs: [ test_echo_list ]
"Traceback (most recent call last):
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/core/rocket.py", line 262, in run
    m_action = t.run_task(my_spec)
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/user_objects/firetasks/dataflow_tasks.py", line 114, in run_task
    outlist = self.command_line_tool(command, inputs, outputs)
  File "/work/ws/nemo/fr_lp1029-IMTEK_SIMULATION-0/local_modules/fireworks/git-master/lib/python3.6/site-packages/fireworks/user_objects/firetasks/dataflow_tasks.py", line 184, in command_line_tool
    assert (arg['source']['type'] is not None
TypeError: list indices must be integers or slices, not str"

Working example

name: Another dataflow trial
fws:
- fw_id: -10
  name: first
  spec:
    _category: nemo_noqueue
    _tasks:
    - _fw_name: CommandLineTask
      command_spec:
        command: [ touch, frame_01.lammps, frame_02.lammps, frame_03.lammps ]
 
    # create list of all output files [ frame_0.lammps ... frame_n.lammps ]
    - _fw_name: PyTask
      func:    glob.glob
      args:    [ "frame*.lammps" ]
      outputs: [ "frame_file_list" ]

    # nest list of all output files  into {"frame_file_list": [ frame_0.lammps ... frame_n.lammps ] }   
    - _fw_name: JoinDictTask
      inputs: [ frame_file_list ]
      output: nested_frame_file_list

    # create list of nested dicts of all output files 
    # [ { source: { type: data, value: frame_0.lammps } } ... { source: { type: data, value: frame_n.lammps } } ]
    # ugly utilization of eval: eval(expression,globals,locals) has empty globals {} 
    # and the content of "nested_frame_file_list", i.e. {"frame_file_list": [ frame_0.lammps ... frame_n.lammps ] }
    # handed as 2nd and 3rd positional argument. Knowledge about the internal PyTask function call is necessary here.
    - _fw_name: PyTask
      func:    eval
      args:    [ '[ { "source": {"type":"data","value":f} } for f in frame_file_list ]', {} ]
      inputs:  [ nested_frame_file_list ]
      outputs: [ frame_file_dict_list ]

- fw_id: -60
  name: mock trial frame processing
  spec:
    _category: nemo_noqueue
    _tasks:
    - _fw_name: ForeachTask
      split: frame_file_dict_list
      task:
        _fw_name: CommandLineTask
        command_spec:
          command:         [ echo , -n ]
          frame_file_dict_list: frame_file_dict_list
          test_echo_list:  
            source: { type: stdout }
            target: { type: data }
        inputs:  [ frame_file_dict_list ]
        outputs: [ test_echo_list ]

links:
  '-10':
  - -60

metadata: {}

file system general

Find certain files older than a certain date and sort by size:

touch --date "2019-08-25" /tmp/end
find . -type f -name '*default.nc' -not -newer /tmp/end -exec du -hs {} + | sort -r -h

Find everything newer than some date and compute size:

{ find . -newer /tmp/end -printf "%s+"; echo 0; } | bc

lpad

Make comma-separated list out of all defused fw ids in certain work flow:

lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 17408)"'}, "state":"DEFUSED"}' -d ids | xargs | tr -d '[ ]'

Rerun all fizzled fws in certain workflow:

lpad rerun_fws -i $(lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 19456 | xargs)"'}, "state": "FIZZLED"}' -d ids | xargs | tr -d '[ ]')

Cancel all reserved fws in certain workflow:

export -f lpad_cancel
lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 19746)"'}, "state":"RESERVED"}' -d ids | xargs | tr -d '[ ]' | tr ',' '\n ' | xargs -n 1 -I '{}' bash -c 'lpad_cancel "{}"'

Get list auch launchdirs for queried Fireworks

lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 20480)"'}, "state":"FIZZLED", "name":{"$regex":"restarted production run"}, "updated_on": {"$lt": "2019-08-27"}}' -d ids | tr -d '[ ,]' | xargs -I {} -n1 lpad get_launchdir {}

Get completed launchdirs in certain workflow sorted by size

lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 20480)"'}, "state":"COMPLETED"}' -d ids | tr -d '[ ,]' | xargs -I {} -n1 lpad get_launchdir {} | xargs -I {} -n1 du -hs {} | sort -hr
@jotelha
Copy link
Author

jotelha commented Aug 23, 2019

lpad rerun_fws -i $(lpad get_fws -q '{"fw_id":{"$in":'"$(lpad_get_fw_ids_in_wflow 19456 | xargs)"'}, "state": "FIZZLED"}' -d ids | xargs | tr -d '[ ]')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment