Skip to content

Instantly share code, notes, and snippets.

@speth
Last active December 5, 2019 15:21
Show Gist options
  • Save speth/3131581 to your computer and use it in GitHub Desktop.
Save speth/3131581 to your computer and use it in GitHub Desktop.
Cantera input file format sandbox

Issues with CTI Input Files

  • Loss of opportunity for interoperability with other software that is easier with a standard format
  • Run-time Python dependency is a source of never-ending difficulty for users
  • Extra work required to implement new input file features, since they need to be implemented in both CTI and XML, which leads to incompleteness in the CTI interface and requries users to use the XML interface in certain cases.

Issues with XML input files

  • The format itself is needlessly verbose. Consider the definition of a single 'falloff' reaction in XML:

    <reaction reversible="yes" type="falloff" id="0095">
      <equation>OH + CH3 (+ M) [=] CH3OH (+ M)</equation>
      <rateCoeff>
        <Arrhenius>
           <A>2.790000E+15</A>
           <b>-1.4299999999999999</b>
           <E units="cal/mol">1330.000000</E>
        </Arrhenius>
        <Arrhenius name="k0">
           <A>4.000000E+30</A>
           <b>-5.9199999999999999</b>
           <E units="cal/mol">3140.000000</E>
        </Arrhenius>
        <efficiencies default="1.0">C2H6:3  CH4:2  CO:1.5  CO2:2  H2:2  H2O:6 </efficiencies>
        <falloff type="Troe">0.412 195 5900 6394 </falloff>
      </rateCoeff>
      <reactants>CH3:1 OH:1.0</reactants>
      <products>CH3OH:1.0</products>
    </reaction>

    compared to the equivalent CTI (Python):

    falloff_reaction("OH + CH3 (+ M) <=> CH3OH (+ M)",
             kf=[2.79000E+18, -1.43, 1330],
             kf0=[4.00000E+36, -5.92, 3140],
             falloff=Troe(A=0.412, T3=195, T1=5900, T2=6394),
             efficiencies="C2H6:3 CH4:2 CO:1.5 CO2:2 H2:2 H2O:6")

    Even after removing the extra whitespace, it's still twice as long.

  • It requires extra processing to extract useful information for (a) the mappings of species name to quantities contained in the <efficiencies>, <reactants> and <products> tags (b) array data such as that in the <falloff> tag. All of the alternatives (Python, JSON, YAML) have intrinsic support for mapping and array data types.

  • It contains redundant information, which leads to confusion and errors. The reaction stoichiometry is encoded both in the <equation> tag as well as in the <reactants> and <products> tags.

  • The method for encoding arrays is inconsistent. In some places, we have a space delimited string, e.g. the <falloff> tag here. In others (e.g. the floatArray associated with species thermo data), we have comma delimited lists. Which of these formats is allowed in any given context is a mystery.

  • Cantera misses one of the key benefits of using an standard format such as XML: There are existing XML parsing libraries that work just fine, and there's no reason for Cantera to have it's own XML parser.

  • Extracting data from the XML tree requires writing a lot of code. For example, here's a snippet of XML code from the definition of a HMWSoln object:

    <thetaAnion anion1="Cl-" anion2="OH-">
      <Theta> -0.05,  0.0, 0.0, 0.0, 0.0 </Theta>
    </thetaAnion>

    The function to read and validate the data from this node is 80 lines long (see https://github.com/Cantera/cantera/blob/master/src/thermo/HMWSoln_input.cpp#L235).

Implementation Considerations

  • Need to decide between JSON, YAML, and other alternatives
  • Want to separate input file parsing from actual application logic (compare the tight coupling of ThermoPhase::initThermoXML to the setupFooReaction functions which are called by newReaction(XML_Node&)).
  • Should be able to create objects without any explicit input file
    • Already possible for ideal gases through Reaction and Species objects
  • Should be able to serialize objects created in this way and generate new input files
  • Old input files can be supported by writing translators
    • Translator from CTI is just a modified version of ctml_writer.py
  • Successful implementation is made difficult by large number of classes missing test coverage (Cantera/cantera#267)
  • Also need to replace XML as the input/output file format for the 1D solver

Concerns with YAML/JSON

  • With YAML, significance of whitespace may confuse some users
  • With both YAML and JSON, order of keys in mappings is not specified, so serialization can result in keys ending up in any order
---
units: {length: cm, time: s, quantity: mol, act_energy: cal/mol}
phases:
- name: gri30
elements: [O, H, C, N, Ar]
species: [H2, H, O, O2, OH, H2O, HO2, H2O2]
thermo: IdealGas
reactions: all
kinetics: gaskinetics
initial_state: {temperature: 300.0, pressure: [1.0, atm],
mole_fractions: {CH4: 0.2, H2O: 0.8}}
species:
- name: CH4
atoms: {C: 1, H: 4}
thermo:
- type: NASA
Tmin: 200.0
Tmax: 1000.0
data: [5.149876130E+00, -1.367097880E-02, 4.918005990E-05, -4.847430260E-08,
1.666939560E-11, -1.024664760E+04, -4.641303760E+00]
- type: NASA
Tmin: 1000.0
Tmax: 3500.0
data: [7.485149500E-02, 1.339094670E-02, -5.732858090E-06, 1.222925350E-09,
-1.018152300E-13, -9.468344590E+03, 1.843731800E+01]
transport: {type: gas, geometry: nonlinear, diameter: 3.75,
well_depth: 141.40, polar: 2.60, rot_relax: 13.00}
- name: H2O
atoms: {H: 2, O: 1}
thermo:
- type: NASA
Tmin: 200.0
Tmax: 1000.0
data: [4.198640560E+00, -2.036434100E-03, 6.520402110E-06, -5.487970620E-09,
1.771978170E-12, -3.029372670E+04, -8.490322080E-01]
- type: NASA
Tmin: 1000.0
Tmax: 3500.0
data: [3.033992490E+00, 2.176918040E-03, -1.640725180E-07, -9.704198700E-11,
1.682009920E-14, -3.000429710E+04, 4.966770100E+00]
transport: {type: gas, geometry: nonlinear, diameter: 2.60,
well_depth: 572.40, dipole: 1.84, rot_relax: 4.00}
reactions:
- type: three_body
equation: "2 O + M <=> O2 + M"
rate: [1.2000e17, -1, 0]
efficiencies: {AR: 0.83, C2H6: 3, CH4: 2, CO: 1.75, CO2: 3.6, H2: 2.4, H2O: 15.4}
- type: troe
equation: "H + CH2 (+ M) <=> CH3 (+ M)"
kf: [6.00000E+14, 0, 0]
kf0: [1.04000E+26, -2.76, 1600]
falloff: [0.562, 91, 5836, 8552]
efficiencies: " AR:0.7 C2H6:3 CH4:2 CO:1.5 CO2:2 H2:2 H2O:6 "
...
{
"units": {
"length": "cm",
"time": "s",
"quantity": "mol",
"act_energy": "cal/mol"
},
"phases": [
{
"name": "gri30",
"elements": ["O", "H", "C", "N", "Ar"],
"species": ["H2", "H", "O", "O2", "OH", "H2O", "HO2", "H2O2"],
"thermo": "IdealGas",
"reactions": "all",
"kinetics": "gaskinetics",
"initial_state": {"temperature": 300, "pressure": [1, "atm"],
"mole_fractions": {"CH4": 0.2, "H2O": 0.8}}
}
],
"species": [
{
"name": "CH4",
"atoms": {"C": 1, "H": 4},
"thermo": [
{
"type": "NASA",
"Tmin": 200,
"Tmax": 1000,
"data": [5.149876130E+00, -1.367097880E-02, 4.918005990E-05, -4.847430260E-08,
1.666939560E-11, -1.024664760E+04, -4.641303760E+00]
},
{
"type": "NASA",
"Tmin": 1000,
"Tmax": 3500,
"data": [7.485149500E-02, 1.339094670E-02, -5.732858090E-06, 1.222925350E-09,
-1.018152300E-13, -9.468344590E+03, 1.843731800E+01]
}
],
"transport": {"type": "gas", "geometry": "nonlinear", "diameter": 3.75,
"well_depth": 141.4, "polar": 2.6, "rot_relax": 13}
},
{
"name": "H2O",
"atoms": {"H": 2, "O": 1},
"thermo": [
{
"type": "NASA",
"Tmin": 200,
"Tmax": 1000,
"data": [4.198640560E+00, -2.036434100E-03, 6.520402110E-06, -5.487970620E-09,
1.771978170E-12, -3.029372670E+04, -8.490322080E-01]
},
{
"type": "NASA",
"Tmin": 1000,
"Tmax": 3500,
"data": [3.033992490E+00, 2.176918040E-03, -1.640725180E-07, -9.704198700E-11,
1.682009920E-14, -3.000429710E+04, 4.966770100E+00]
}
],
"transport": {"type": "gas", "geometry": "nonlinear", "diameter": 2.6,
"well_depth": 572.4, "dipole": 1.84, "rot_relax": 4}
}
],
"reactions": [
{
"type": "three_body",
"equation": "2 O + M <=> O2 + M",
"rate": [1.2000e17, -1, 0],
"efficiencies": {"AR": 0.83, "C2H6": 3, "CH4": 2, "CO": 1.75,
"CO2": 3.6, "H2": 2.4, "H2O": 15.4}
},
{
"type": "troe",
"equation": "H + CH2 (+ M) <=> CH3 (+ M)",
"kf": [6.00000E+14, 0, 0],
"kf0": [1.04000E+26, -2.76, 1600],
"falloff": [0.562, 91, 5836, 8552],
"efficiencies": " AR:0.7 C2H6:3 CH4:2 CO:1.5 CO2:2 H2:2 H2O:6 "
}
]
}
#include "cantera/thermo.h"
#include "yaml-cpp/yaml.h" // tested with yaml-cpp 0.5.3
using namespace Cantera;
using namespace std;
SpeciesThermoInterpType* newNasaPoly2(const YAML::Node& yaml)
{
int ilow = (yaml[1]["Tmin"].as<double>() > yaml[0]["Tmin"].as<double>()) ? 0 : 1;
int ihigh = 1 - ilow;
double tlow = yaml[ilow]["Tmin"].as<double>();
double thigh = yaml[ihigh]["Tmax"].as<double>();
double tmid = yaml[ilow]["Tmax"].as<double>();
if (fabs(tmid - yaml[ihigh]["Tmin"].as<double>()) > 0.01) {
throw CanteraError("newNasaPoly2", "non-continuous temperature ranges"
" {} != {}", tmid, yaml[ihigh]["Tmin"].as<double>());
}
vector_fp coeffs(1, tmid);
coeffs.reserve(15);
for (auto& coeff : yaml[ihigh]["data"]) {
coeffs.push_back(coeff.as<double>());
}
for (auto& coeff : yaml[ilow]["data"]) {
coeffs.push_back(coeff.as<double>());
}
double pref = OneAtm;
return newSpeciesThermoInterpType("nasa", tlow, thigh, pref, coeffs.data());
}
void parseSpecies(Species& S, const YAML::Node& yaml) {
S.name = yaml["name"].as<string>();
S.composition = yaml["atoms"].as<map<string, double>>();
S.thermo.reset(newNasaPoly2(yaml["thermo"]));
}
void yaml_demo()
{
YAML::Node data = YAML::LoadFile("sample.yml");
const YAML::Node& phase_data = data["phases"][0]; // Take the first phase node
unique_ptr<ThermoPhase> gas(newThermoPhase(phase_data["thermo"].as<string>()));
for (const auto& elem : phase_data["elements"]) {
gas->addElement(elem.as<string>());
}
for (const auto& spnode : data["species"]) {
shared_ptr<Species> S(new Species());
parseSpecies(*S, spnode);
gas->addSpecies(S);
}
const YAML::Node& state = phase_data["initial_state"];
const YAML::Node& pNode = state["pressure"];
double p;
if (pNode.IsScalar()) {
p = pNode.as<double>();
} else {
p = pNode[0].as<double>() * toSI(pNode[1].as<string>());
}
gas->setState_TPX(state["temperature"].as<double>(), p,
state["mole_fractions"].as<map<string, double>>());
writelog("{}\n", gas->report());
}
int main()
{
try {
yaml_demo();
} catch (exception& err) {
writelog("{}\n", err.what());
}
}
#include <fstream>
#include <iostream>
#include <vector>
#include "jsoncpp/json/json.h"
struct NasaThermo
{
double Tmin;
double Tmax;
std::vector<double> coeffs;
};
struct Species
{
std::string name;
std::map<std::string, int> composition;
std::vector<NasaThermo> thermo;
};
void operator >>(const Json::Value& node, NasaThermo& t)
{
t.Tmin = node["Tmin"].asDouble();
t.Tmax = node["Tmax"].asDouble();
const Json::Value& dataNode = node["data"];
t.coeffs.resize(dataNode.size());
for (int i=0; i<dataNode.size(); i++) {
t.coeffs[i] = dataNode[i].asDouble();
}
}
void operator >>(const Json::Value& node, Species& s)
{
s.name = node["name"].asString();
const Json::Value& compNode = node["atoms"];
for (Json::ValueIterator it=compNode.begin(); it!=compNode.end(); ++it) {
s.composition[it.memberName()] = (*it).asDouble();
}
const Json::Value& thermoNode = node["thermo"];
s.thermo.resize(thermoNode.size());
for (int i=0; i<thermoNode.size(); i++) {
thermoNode[i] >> s.thermo[i];
}
}
int main(int argc, char** argv)
{
std::ifstream fin("sample.json");
Json::Value doc;
fin >> doc;
std::vector<Species> species;
const Json::Value& spec = doc["species"];
for (int i=0; i!=spec.size(); i++) {
Species s;
spec[i] >> s;
species.push_back(s);
}
return 0;
}
@wandadars
Copy link

YAML certainly is the most visually appealing format. The whitespace constraint would serve to explicitly reinforce readable files.

@wandadars
Copy link

Would something like this be a good topic for GSOC 2019? Or is it too complex for a single person to undertake?

@ischoegl
Copy link

ischoegl commented Mar 28, 2019

I have a comment regarding YAML: while default loaders will obviously use c++ code (where the above format works), some Cantera users may end up loading YAML from Python.

PyYAML is somewhat buggy (e.g. won't parse numbers 3.2e5 as expected: yaml.load('a: 3.2e5') returns a string (both PyYAML 3.12 and 5.1 with the additional Loader option), whereas json.loads('{"a":3.2e5}') is more specific and returns the expected output).

E.g. the above example (01-sample.yaml) will not load correctly using PyYAML 5.1

import yaml
with open('01-sample.yaml','r') as yml:
    out = yaml.load(yml, Loader=yaml.FullLoader)

Issuing type(out['reactions'][0]['rate'][0]) returns str whereas it should be float. (PyYAML 3.12 has the same behavior.)

There are workarounds, but I see this as a potential source for significant frustration ...

PS: there is an open issue on PyYAML, see link

@AdityaSavara
Copy link

AdityaSavara commented Dec 4, 2019

I am not sure if I should call this a 'concern' about YAML, but more of a suggestion. In your example, you don't have quote characters around the species. For example mole_fractions: {CH4: 0.2, H2O: 0.8} For yaml, if somebody tries to put a dash in a species name, I think it would cause a problem. So while quotation marks are not necessary, if cantera dumps things to yaml, I think cantera should add the single quotation marks: mole_fractions: {'CH4': 0.2, 'H2O': 0.8}. I think I saw you mentioning somewhere a suggestion to make function names have dashes as an option. For yaml purposes, the all lower case and underline may be safer. In fact, here is an example where somebody put a dash in a species name: https://groups.google.com/forum/?fromgroups#!topic/cantera-users/vvaikJ1IGxY In surface science studies (not to mention conventional chemistry nomenclature) dashes are anormal thing to use.

Edit: after speth's comment below, I played a bit and see that speth is correct. It seems that dashes in mapped values forces them to become strings. Interestingly, even values like 0.2 - 0.1 become cast into a string. (last example below). I don't see this ever causing a problem for cantera, since it would be a user error if they put an equation where only a scalar is allowed.

Input:

- mole_fractions: {CH4: 0.2, H2O: 0.8}
- mole_fractions: {CH4-withdash: 0.2, H2O: 0.8}
- mole_fractions: {1.0: 0.2, H2O: 0.8}
- mole_fractions: {1.0-2.0: 0.2, H2O: 0.8}
- mole_fractions: {1.0-2.0: 0.2 - 0.1, H2O: 0.8}

Output:

  {'mole_fractions': {'CH4': 0.2, 'H2O': 0.8}},
  {'mole_fractions': {'CH4-withdash': 0.2, 'H2O': 0.8}},
  {'mole_fractions': {1.0: 0.2, 'H2O': 0.8}},
  {'mole_fractions': {'1.0-2.0': 0.2, 'H2O': 0.8}},
  {'mole_fractions': {'1.0-2.0': '0.2 - 0.1', 'H2O': 0.8}}]

@speth
Copy link
Author

speth commented Dec 4, 2019

Dashes are not a problem in YAML strings and do not require quotation marks. For cases where quotation marks or escaped values are necessary, both of the YAML libraries used by Cantera (yaml-cpp for C++, ruamel.yaml for Python) are smart enough to do so automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment