Recently, I have played around a bit with onnx-models. One thing I wanted to do was to piece together multiple small models into one large model. I didn´t find any utility to do this (though it might be the issue that I didn´t look hard enough), so I decided to figure out how hard it was to do by hand. This little post details that little adventure. Please be warned that I am definitely no expert on onnx-models, so don´t take anything I write here as an asbolute truth. :-)
I use the python packages onnx and
onnxruntime. The package onnx
contains code to create and read onnx files, as well as helper factory
functions. The package onnxruntime
contains code to evaluate an onnx-model,
and has the package onnx
as a dependency. All the code snippets assume the
imports
import onnx
import onnxruntime
import numpy as np
I had some issues as I installed onnx
before onnxruntime
. Reverting to a
clean environment with python version 3.7.7 and install onnxruntime
, which
pulls onnx
as a dependency, worked me.
From the specification, we see than an onnx-model is a cycle-free network of a set of inputs and a set of outputs connected via a set of nodes, where each node is an operator. Each internal edge gets a label that is used to describe connections. The graph is encoded on statically single assigment (SSA) form; a label is only assigned a value once (hence all inputs, outputs and temporary lables must be unique), and a label cannot be used as input before it has been assigned as output, except for the input nodes.
Thus, the graph is already topologically sorted so that a onnx runtime can directly evaluate all the nodes in order of declaration.
For this experiment, I needed a simple building block, and I wanted at least one internal edge, to make sure that handling of internal edges worked out. I chose a multiply-add operation:
r = a + b * c
built from a multiply and an add with a temporary value which I named t
,
t = b * c
r = a + t
Drawing this as a nice ascii-art graph we get
a b c 2 3 5
| | | | | |
| V V | V V
| +-----+ | +-----+
| | Mul | | | Mul |
V +--+--+ V +--+--+
+-----+ | +-----+ |
| Add |<-- t | Add |<-- 3*5=15
+--+--+ +--+--+
| |
V V
r 2+15 = 17
where the left sides has input, output, and temporaries labelled, and the right side have some numbers with a associated calculations. I will use those calculations to test my network.
The onnx format is built on protocol
buffers. The onnx
package
contains the corresponding serialization code as well as some helper factory
functions.
Using the onnx
package, our multiply-add network can be realized with:
inputs = [
onnx.helper.make_tensor_value_info('a', onnx.TensorProto.FLOAT, [2,3,5]),
onnx.helper.make_tensor_value_info('b', onnx.TensorProto.FLOAT, [2,3,5]),
onnx.helper.make_tensor_value_info('c', onnx.TensorProto.FLOAT, [2,3,5])
]
outputs = [
onnx.helper.make_tensor_value_info('r', onnx.TensorProto.FLOAT, [2,3,5])
]
nodes = [
onnx.helper.make_node('Mul', ['b', 'c'], ['t']),
onnx.helper.make_node('Add', ['a', 't'], ['r'])
]
madd_graph = onnx.helper.make_graph(nodes, "multiply-add", inputs, outputs)
madd_model = onnx.helper.make_model(madd_graph)
onnx.save(madd_model, 'mad.onnx')
Here I have specified the three inputs are assumed to be rank 3 tensors with
dimension [2,3,5] and of float type; as I wanted to use something more
interesting than plain scalars. The type and shape of the temporary t
is
implicit.
Now, it would be nice to try this model to see if it works. Using a tool like netron, we can directly view the network in the onnx file and verify that it makes sense.
To actually run the model, we use the package onnxruntime
. The idea behind
onnx is to specify the model as a rather abstract graph and not code, and use
a runtime to instantiate the model by either evaluating the graph or generate
code.
First we load the model and create an inference session. We then create the input (getting the appropriate shape of the input tensors from the session), and then run it:
madd_session = onnxruntime.InferenceSession('mad.onnx')
a = np.full(madd_session.get_inputs()[0].shape, 2.0).astype(np.float32)
b = np.full(madd_session.get_inputs()[1].shape, 3.0).astype(np.float32)
c = np.full(madd_session.get_inputs()[2].shape, 5.0).astype(np.float32)
r = madd_session.run(['r'], {'a': a, 'b': b, 'c': c})
Which gives the output
print(r[0])
# outputs [[17. 17. 17. 17. 17.]
# [17. 17. 17. 17. 17.]
# [17. 17. 17. 17. 17.]]
#
# [[17. 17. 17. 17. 17.]
# [17. 17. 17. 17. 17.]
# [17. 17. 17. 17. 17.]]]
print(r[0].shape)
# outputs (2, 3, 5)
Thus the output is of the correct size with the right values.
This model has only one output, but a model can have multiple outputs and you specify which of them you are interested in (and thus the runtime can prune the graph to avoid performing calculations of the results that are discarded
Now over to the meat of the matter. I wanted to duct-tape together four basic multiply-add blocks in the following way:
i0 i1 i2 i3 i4 i5 i6 i7 i8 2 3 5 7 11 13 17 19 23
| | | | | | | | | | | | | | | | | |
| | | V V V V V V | | | V V V V V V
| | | a---b---c a---b---c | | | a---b---c a---b---c
| | | | Mad 1 | | Mad 2 | | | | | Mad 1 | | Mad 2 |
| | | +---r---+ +---r---+ | | | +---r---+ +---r---+
| | | | | | | | | |
| | | +- t0 | | | | +- 17+19*23=150 |
| | | | | | | | | |
| | V V | | | V V |
| | a---b---c <------- t1 | | a---b---c <---- 17+19*23=454
| | | Mad 3 | | | | Mad 3 |
| | +---r---+ | | +---r---+
| | | | | |
| | t2--+ | | +---+ 5+150*454=68105
| | | | | |
V V V V V V
a---b---c a---b---c
| Mad 4 | | Mad 4 |
+---r---+ +---r---+
| |
o 2+3*68105=204317
This doesn´t compute anything interesting, but the structure has some internal
nodes, some parallelism, and some sequential dependencies. On the left I have
used symbols (denoting inputs i0
through i8
and the single output o
),
and on the right I have the calculation of some numbers that I will use to
test the network.
The idea is to topologically sort this graph and just concatenate the nodes of each subgraph, building the node order for the full graph. Also, I have to relabel input and outputs.
I start with loading four instances of the same model. This way, I can just move nodes from the building block instances into the large graph, oblivious to any node properties beyond input and output labels.
model_paths = {
'mad1': 'mad.onnx',
'mad2': 'mad.onnx',
'mad3': 'mad.onnx',
'mad4': 'mad.onnx'
}
models = {}
for key, model_path in model_paths.items():
models[key] = onnx.load(model_path)
We define the inputs and outputs for the full model by specifying which input/output of which sub-model that plays that role:
input_spec = {
'i0': ('mad4', 'a'),
'i1': ('mad4', 'b'),
'i2': ('mad3', 'a'),
'i3': ('mad1', 'a'),
'i4': ('mad1', 'b'),
'i5': ('mad1', 'c'),
'i6': ('mad2', 'a'),
'i7': ('mad2', 'b'),
'i8': ('mad2', 'c')
}
output_spec = {
'o': ('mad4', 'r')
}
Here, the first line of input_spec
says that the full model input i0
is
actually input a
of sub-model mad4
.
First, we need to a small utility function that finds a particularly named
instance in a sequence of repeated ValueInfoProto
s:
def get_by_name(repfield, name):
for item in repfield:
if item.name == name:
return item
raise RuntimeError(f'Failed to find item named {name}')
Then, the following snippets run through the specifications, find the right
ValueInfoProto
(structure that describes an input/output) on the submodel,
corrects its name, and returns a list with the full models input/output.
def generate_inputs(input_spec, models):
inputs = []
for global_name, (model, model_name) in input_spec.items():
input = get_by_name(models[model].graph.input, model_name)
input.name = global_name
inputs.append(input)
return inputs
def generate_outputs(output_spec, models):
outputs = []
for global_name, (model, model_name) in output_spec.items():
output = get_by_name(models[model].graph.output, model_name)
output.name = global_name
outputs.append(output)
return outputs
We start by specifying an ordering of the sub-models such that no value is used as input before it has been computed, along with the mapping of model inputs and outputs to full model input, outputs, and temporaries:
model_spec = [
('mad1', {'a': 'i3', 'b': 'i4', 'c': 'i5'}, {'r': 't0'}),
('mad2', {'a': 'i6', 'b': 'i7', 'c': 'i8'}, {'r': 't1'}),
('mad3', {'a': 'i2', 'b': 't0', 'c': 't1'}, {'r': 't2'}),
('mad4', {'a': 'i0', 'b': 'i1', 'c': 't2'}, {'r': 'o'})
]
The first line specifies that we start with submodel mad1
, and we relabel
its inputs a
, b
, and c
as i3
, i4
, and i5
, and its output r
as
temporary t0
.
A sub-model can only be referenced once here, since we transplant the models directly instead of cloning different pieces (and that is why we loaded the same model four times earlier).
A final issue is that sub-models can have internal temporaries (and indeed our building block model has exactly that --- on purpose), and we have to make sure that these have unique names. We solve this by simpy adding the sub-model name as a prefix.
The following code takes a sequence of labels and creates a new set of labels, either from the map if the name is present, or if not, it prefixes the label:
def relabel(old_labels, rename_map, prefix):
new_labels = []
for old_label in old_labels:
if old_label in rename_map:
new_labels.append(rename_map[old_label])
else:
new_labels.append(prefix + "_" + old_label)
return new_labels
Then we are ready to transplant the nodes of the sub-models into the full model, while updating input and output names of each node:
def generate_nodes(model_spec, models):
nodes = []
for model_name, in_spec, out_spec in model_spec:
graph = models[model_name].graph
for node in graph.node:
node.input[:] = relabel(node.input, in_spec, model_name)
node.output[:] = relabel(node.output, out_spec, model_name)
nodes.append(node)
return nodes
With the bits and pieces in place, building the model and store it in a file is pretty straightforward:
comp_graph = onnx.helper.make_graph(generate_nodes(model_spec, models),
"mad-composition",
generate_inputs(input_spec, models),
generate_outputs(output_spec, models))
comp_model = onnx.helper.make_model(comp_graph)
onnx.save(comp_model, 'comp.onnx')
To run the code, we do basically the same as we did when running the building block model. The only difference is that the inputs are built using a loop, since there are so many of them:
import onnxruntime
s = onnxruntime.InferenceSession('comp.onnx')
vals = [2, 3, 5, 7, 11, 13, 17, 19, 23]
inputs = {}
for i, val in enumerate(vals):
inputs['i'+str(i)] = np.full(s.get_inputs()[0].shape, val).astype(np.float32)
r = s.run(['o'], inputs)
which produces the output:
print(r[0])
# outputs: [[[204317. 204317. 204317. 204317. 204317.]
# [204317. 204317. 204317. 204317. 204317.]
# [204317. 204317. 204317. 204317. 204317.]]
#
# [[204317. 204317. 204317. 204317. 204317.]
# [204317. 204317. 204317. 204317. 204317.]
# [204317. 204317. 204317. 204317. 204317.]]]
print(r[0].shape)
# (2, 3, 5)
Thus the output is again of the correct size with the right values.
In this little note I have detailed how I duct-taped multiple onnx models into a composite onnx-model. I didn't find much info about this scenario, so I detailed my experiments in this post. So I hope anyone besides me finds this interesting.