Skip to content

Instantly share code, notes, and snippets.

@etoyoda
Last active December 21, 2015 17:28
Show Gist options
  • Save etoyoda/6340184 to your computer and use it in GitHub Desktop.
Save etoyoda/6340184 to your computer and use it in GitHub Desktop.
review on example metadata records from GAWSIS received on 2013-08-08.

introduction

  • 8 out of 10 received records are not well-formed XML
  • all 10 received records did not validate against XML Schema
  • style: keywords from unclear sources

well-formedness

  • ill-formed records are due to raw ampersand '&' in xlink:href attribute
  • please be careful not to create ill-formed XML
    • not accepted by XML-based transfer protocol such as OAI-PMH
    • also rejected by virtually all processings like XSLT filtering and XML editors

No good:

<gmd:geographicElement xlink:href="https://maps.google.com/maps?hl=en&q=58.8057833,17.3883667+(Aspvreten, 20 m a.s.l.)" xlink:title="Aspvreten"/>

Must be:

<gmd:geographicElement xlink:href="https://maps.google.com/maps?hl=en&amp;q=58.8057833,17.3883667+(Aspvreten, 20 m a.s.l.)" xlink:title="Aspvreten"/>

requied changes to validate against XML schema

A filter (in XSLT language) is given at http://toyoda-eizi.net/2013/0415gawsis/example1308/fix-xsd.xsl which does following corrections to make it valid against XML Schema of ISO 19139.

gmd:dateStamp must contain gco:Date

No good:

<gmd:dateStamp>2013-08-07</gmd:dateStamp>

Must be:

<gmd:dateStamp><gco:Date>2013-08-07</gco:Date></gmd:dateStamp>

It is also possible to use gco:DateTime with value like 2013-08-07T12:34:56Z instead of gco:Date.

Attribute nilReason in gmd elements must be gco:nilReason

In general, an attribute in XML may be explicitly prefixed to belong to a namespace, or otherwise directly belongs to the parent element. Some elements with gmd prefix (i.e. those defined in ISO 19139) may have gco:nilReason attribute, not nilReason.

No good:

<gmd:purpose nilReason="unknown"/>

Must be:

<gmd:purpose gco:nilReason="unknown"/>

Confusingly, the attribute nilReason in GML is to be prefixed "gml", not "gco".

But some gmd: elements don't allow gco:nilReason attribute

I found elements gmd:CI_Responsibleparty, gmd:CI_Contact, gmd:MD_ReferenceSystem, gmd:MD_DataIdentification, and gmd:MD_Format have gco:nilReason which breaks validation. There have to be removed.

As a general rule, gco:nilReason is not allowed for XML elements representing UML class (the name begins with two uppercase letters and underscore like MD_ or CI_) and XML elements representing UML property of simple conceptual types (the element name begins with lowercase letters but contains only one level of XML element such as gco:Date or gco:CharacterString). The gco:nilReason is allowed only for XML elements representing UML property (name beginning with lowercase, again) of UML class (hence contains XML elemtns with MD_ etc.).

Confusing? Yes, for me too. If you feel unhappy to think about those rules, maybe it is better to remove all nilReason attributes. The XML Schema doesn't complain it.

Last time I recommended use of gco:nilReason. I brought the recommendation to CBS/TT-ApMD meeting. But the meeting did not agree on it. So this feature is optional, and to be used only if the metadata creators are confident that they can use it in harmless manner.

It's no good to use nilReason if there is value

XSD doesn't have capacity to alert it, but it doesn't make sense to use nilReason when there is content.

<gmd:CI_Contact nilReason="missing">
  <gmd:phone>
    <gmd:CI_Telephone>
      <gmd:voice>
        <gco:CharacterString>+41 (0)44 256 92 23</gco:CharacterString>
      </gmd:voice>

no nilReason for gml:scope

I don't know why. We can let it be empty instead.

no attribute id for gmd:geographicElement

No good:

<gmd:geographicElement id="Country">
  <gmd:EX_GeographicDescription>

If it is necessary to have id="Country", it can be on the uppercase elements for example gmd:EX_GeographicDescription:

<gmd:geographicElement>
  <gmd:EX_GeographicDescription id="Country">

broken temporalElement

Sorry I can't figure out what does it mean.

<gmd:temporalElement nilReason="missing">
  <gmd:EX_TemporalExtent>
    <gmd:extent>
      <gml:TimePeriod>urn.x-wmo.md.ch.meteoswiss.gawsis..70400000010_mole_fraction_of_carbon_monoxide_in_air_x1h_BRW_unknown.timePeriod</gml:TimePeriod>
      <gml:TimePeriod/>
      <gml:TimePeriod/>
    </gmd:extent>
  </gmd:EX_TemporalExtent>
</gmd:temporalElement>

We can remove everything except the attribute nilReason. But the processing is easier to remove entire gmd:temporalElement. That element is optional, so users must be prepared for missing temporalElement.

broken four VerticalCRSs in row

<gmd:verticalCRS>
  <gml:VerticalCRS>
  urn.x-wmo.md.ch.meteoswiss.gawsis..70400000010_mole_fraction_of_carbon_monoxide_in_air_x1h_BRW_unknown.VerticalCRS
    <gml:description>m above sea level</gml:description>
  </gml:VerticalCRS>
  <gml:VerticalCRS>
   <gml:identifier>WGS84</gml:identifier>
   <gml:scope>0-50000</gml:scope>
  </gml:VerticalCRS>
  <gml:VerticalCRS/>
  <gml:VerticalCRS/>
</gmd:verticalCRS>

gmd:MD_DataIdentification cannot have gml:id

In this case the attribute must be unprefixed.

<gmd:MD_DataIdentification gml:id="identInfo">

Must be:

<gmd:MD_DataIdentification id="identInfo">

id attribute cannot contain colon (:)

<gml:TimePeriod gml:id="urn:x-wmo:md:ch.meteoswiss.gawsis::100000000050_mole_fraction_of_gamma_hexachlorocyclohexane_in_air_xXx_APT_unknown.timePeriod">

could be:

<gml:TimePeriod gml:id="urn_x-wmo_md_ch.meteoswiss.gawsis__100000000050_mole_fraction_of_gamma_hexachlorocyclohexane_in_air_xXx_APT_unknown.timePeriod">

broken gmd:geographicElement having text inside

Perhaps did it intend to be as id attribute? Even if so, it has to be inside uppercase element (EX_GeographicDescription).

<gmd:geographicElement>
  https://maps.google.com/maps?hl=en&amp;q=71.323013305664,-156.611465454102 +(Barrow (AK), 11 m a.s.l.)Barrow (AK)
  <gmd:EX_GeographicDescription>
    <gmd:geographicIdentifier>
      <gmd:MD_Identifier>
        <gmd:code>
          <gco:CharacterString>Barrow (AK)</gco:CharacterString>

style suggestions

keyword saying from WMO list but actually isn't

Following descriptiveKeywords list says its source is "WMO keyword list, version 1.0" but there's no such list containing the keyword "atmospheric composition chemistry physics". The role of this keyword block seems to overlap with standardized keyword "atmosphericComposition".

<gmd:descriptiveKeywords>
<gmd:MD_Keywords>
 <gmd:keyword>
  <gco:CharacterString>atmospheric composition chemistry physics</gco:CharacterString>
 </gmd:keyword>
 <gmd:type>
  <gmd:MD_KeywordTypeCode
  codeList="http://wis.wmo.int/2006/catalogues/gmxCodelists.xml#MD_KeywordTypeCode"
  codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
 </gmd:type>
 <gmd:thesaurusName>
 <gmd:CI_Citation>
  <gmd:title>
   <gco:CharacterString>WMO keyword list, version 1.0</gco:CharacterString>
  </gmd:title>
  <gmd:date>
  <gmd:CI_Date>
   <gmd:date>
    <gco:Date>2008-09-23</gco:Date>
   </gmd:date>
   <gmd:dateType>
    <gmd:CI_DateTypeCode
    codeList="http://wis.wmo.int/2006/catalogues/gmxCodelists.xml#CI_DateTypeCode"
    codeListValue="publication">publication</gmd:CI_DateTypeCode>
   </gmd:dateType>
  </gmd:CI_Date>
  </gmd:date>
 </gmd:CI_Citation>
 </gmd:thesaurusName>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>

keyword in (probably) GAW's list

Following keyword seems to be from GAW's list, so it might be useful to specfy a fixed thesaurus name such as "GAW parameter list".

<gmd:descriptiveKeywords>
 <gmd:MD_Keywords>
  <gmd:keyword>
    <gco:CharacterString>Radiation:IR:downwelling_longwave_radiation_in_air</gco:CharacterString>
  </gmd:keyword>
  <gmd:type>
   <gmd:MD_KeywordTypeCode
   codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_KeywordTypeCode"
   codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
  </gmd:type>
 </gmd:MD_Keywords>
</gmd:descriptiveKeywords>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment