Skip to content

Instantly share code, notes, and snippets.

@rushirajnenuji
Created August 14, 2023 23:39
Show Gist options
  • Save rushirajnenuji/c1ae9605c5c045b78eb6b8f6cbab43b8 to your computer and use it in GitHub Desktop.
Save rushirajnenuji/c1ae9605c5c045b78eb6b8f6cbab43b8 to your computer and use it in GitHub Desktop.
ChatGPT generated EML validator, based on the algorithm mentioned in https://eml.ecoinformatics.org/validation-and-content-references
function validateXML(xmlString) {
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'application/xml');
const rootElement = xmlDoc.documentElement;
if (rootElement.tagName !== 'eml' && rootElement.tagName !== 'eml:eml') {
return "Invalid: Root element is not 'eml'";
}
const identifiersHash = {};
const referencesHash = {};
const unitsHash = {};
const elements = xmlDoc.getElementsByTagName('*');
for (const element of elements) {
const idAttr = element.getAttribute('id');
const annotationChild = element.querySelector('annotation');
if (!idAttr) {
if (annotationChild && !annotationChild.getAttribute('references')) {
return "Invalid: Element without 'id' attribute and annotation child without 'references' attribute";
}
} else {
if (identifiersHash[idAttr]) {
return "Invalid: Duplicate 'id' attribute";
}
identifiersHash[idAttr] = element.getAttribute('system');
if (element.querySelector('references')) {
return "Invalid: Element with 'id' attribute contains 'references' element";
}
}
const referencesAttr = element.getAttribute('references');
if (referencesAttr) {
if (!referencesHash[referencesAttr]) {
referencesHash[referencesAttr] = element.getAttribute('system');
} else if (referencesHash[referencesAttr] !== element.getAttribute('system')) {
return "Invalid: Mismatched 'references' attribute value";
}
}
}
const annotations = xmlDoc.querySelectorAll('annotation[references]');
for (const annotation of annotations) {
const referencesKey = annotation.getAttribute('references');
if (!referencesHash[referencesKey]) {
referencesHash[referencesKey] = '';
}
}
const additionalMetadataDescribes = xmlDoc.querySelectorAll('additionalMetadata describes');
for (const describes of additionalMetadataDescribes) {
const describesKey = describes.getAttribute('key');
if (!referencesHash[describesKey]) {
referencesHash[describesKey] = '';
}
}
const customUnits = xmlDoc.querySelectorAll('customUnit');
for (const customUnit of customUnits) {
const customUnitKey = customUnit.getAttribute('key');
unitsHash[customUnitKey] = true;
}
for (const unit in unitsHash) {
if (!identifiersHash[unit]) {
return "Invalid: Missing 'id' attribute for unit";
}
}
for (const key in referencesHash) {
if (!identifiersHash[key] || referencesHash[key] !== identifiersHash[key]) {
return "Invalid: Mismatched 'references' attribute value or missing 'id' attribute for reference";
}
}
return "Valid: Document passed validation";
}
// Example XML string for testing
const xmlString =
`<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" packageId="doi:10.18739/A26W9695B" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 eml.xsd" system="https://doi.org">
<dataset>
<title>Photogrammetric scans of aerial photographs of North American glaciers, 1994</title>
<creator>
<individualName>
<givenName>Dr. Matt</givenName>
<surName>Nolan</surName>
</individualName>
<phone>907 978 0542</phone>
<electronicMailAddress>info@fairbanksfodar.com</electronicMailAddress>
</creator>
<creator>
<individualName>
<givenName>Austin S.</givenName>
<surName>Post</surName>
</individualName>
<organizationName>USGS</organizationName>
</creator>
<creator>
<individualName>
<givenName>William</givenName>
<surName>Hauer</surName>
</individualName>
</creator>
<creator>
<individualName>
<givenName>Alexander</givenName>
<surName>Zinck</surName>
</individualName>
<electronicMailAddress>azinck@usgs.gov</electronicMailAddress>
</creator>
<creator>
<individualName>
<givenName>Shad</givenName>
<surName>O'Neel</surName>
</individualName>
<organizationName>Alaska Science Center - USGS</organizationName>
<electronicMailAddress>soneel@usgs.gov</electronicMailAddress>
<userId directory="https://orcid.org">https://orcid.org/0000-0002-9185-0144</userId>
</creator>
<pubDate>2017-09-06</pubDate>
<abstract>
<para>
<emphasis> Introduction: </emphasis>
</para>
<para>Between 1958 and 1999, Austin Post led the USGS collection of aerial imagery of North American glaciers. These images are primarily vertical stereo black and white images, although single oblique images, as well as color images have been collected. The glaciers of North America were the subjects, and the digital products made available here serve to document the changes that have occurred to the glaciers over the past 5 decades. The purpose of this project is to preserve the data contained within these film images in a digital format for future analysis of North American glacier change.</para>
<para>
<emphasis> File Layout: </emphasis>
</para>
<para>
<emphasis> 1. </emphasis>
</para>
<para> The first level contains an overall data set of image metadata from 1964 - 1997 (nagapData.csv) and an R script (searchData.R) with instructions on how to search and subset the data. fileLayout.pdf shows the file structure and folder contents visually. There are also three kml files with flight path information by decade. </para>
<para>
<emphasis> 2. </emphasis>
</para>
<para> The second level (this level) is the year in which the pictures were taken. There are 32 years with images from 1964 – 1997. The majority of these folders are jpegs with notes provided by Austin Post. They also contain a year-specific csv (YYYY.csv) that contains image metadata for the entire year (date, roll numbers, location name, longitude, latitude, altitude, media, and comments). The overall data set (nagapData.csv) is the aggregate of each individual “YYYY.csv” file. </para>
<para>
<emphasis> 3. </emphasis>
</para>
<para>The glacier photos are located at the third level. The folders at this level are distinguished by camera roll number (1, 2, etc.), and image type (thumbnail, jpeg, or tif); some also contain fiducial and oblique image folders. This level primarily contains image files of aerial photos as either thumbnails, jpegs, or tifs. It also includes a csv with image metadata specific to each roll (date, roll numbers, location name, longitude, latitude, altitude, media, and comments), a text file (info.txt) with camera specifications unique to each image, and a text file (histo.txt) with color information unique to each image.
</para>
</abstract>
<keywordSet>
<keyword>Alaska</keyword>
<keyword>glacier</keyword>
<keyword>aerial</keyword>
<keyword>USGS</keyword>
</keywordSet>
<intellectualRights>This work is dedicated to the public domain under the Creative Commons Universal 1.0 Public Domain Dedication. To view a copy of this dedication, visit https://creativecommons.org/publicdomain/zero/1.0/.</intellectualRights>
<distribution>
<online>
<url>http://doi.org/doi:10.18739/A26W9695B</url>
</online>
</distribution>
<coverage>
<geographicCoverage>
<geographicDescription>Refer to the accompanying csv file for location coordinates: Harrima Glacier, Barry-Coxe Glacier, Harvard Glacier, Columbia Glacier, Yale Glacier, Meares Glacier, Bering Glacier 1, Bering Glacier, Bering Glacier 2, Harriman Glacier, Columbia Oblique, Van Cleve Lake, Miles Glacier, West Rthington Glacier, South Uth Cascade Glacier, South Oupe Glacier, Valdez Glacier, South Eridan Glacie, Childs Glacier, South Rpise Glacier, Knik Glacier, Coxe Glacier, Harvard Arm, Columbia Bay, North Squaly Glaicer, Cowlitz Glacier, East Mons Glacier, Carbon Glacier, Tahoma Glacier, South Uth Tahoma Glacier</geographicDescription>
<boundingCoordinates>
<westBoundingCoordinate>-148.740645</westBoundingCoordinate>
<eastBoundingCoordinate>-121.045992</eastBoundingCoordinate>
<northBoundingCoordinate>61.432717</northBoundingCoordinate>
<southBoundingCoordinate>46.799385</southBoundingCoordinate>
</boundingCoordinates>
</geographicCoverage>
<temporalCoverage>
<rangeOfDates>
<beginDate>
<calendarDate>1994-02-14</calendarDate>
</beginDate>
<endDate>
<calendarDate>1994-10-31</calendarDate>
</endDate>
</rangeOfDates>
</temporalCoverage>
</coverage>
<contact>
<individualName>
<givenName>Shad</givenName>
<surName>O'Neel</surName>
</individualName>
<organizationName>Alaska Science Center - USGS</organizationName>
<positionName>soneel@usgs.gov</positionName>
</contact>
<contact>
<individualName>
<givenName>Dr. Matt</givenName>
<surName>Nolan</surName>
</individualName>
<phone>907 978 0542</phone>
<electronicMailAddress>info@fairbanksfodar.com</electronicMailAddress>
</contact>
<publisher>
<organizationName>NSF Arctic Data Center</organizationName>
<electronicMailAddress>support@arcticdata.io</electronicMailAddress>
<onlineUrl>http://arcticdata.io</onlineUrl>
<userId directory="https://www.wikidata.org/">Q77285095</userId>
</publisher>
<methods>
<methodStep>
<description>
<para>
The images are primarily vertical stereo black and white images, although single oblique images, as well as
color also were collected. This data release includes all of the vertical frames that have been photogrammetrically
scanned.The data release does not constitute the full collection of aerial imagery. Full resolution TIF files are
provided for each scanned frame, in addition to a medium resolution JPG file and a low resolution thumbnail file.
Film rolls included in this release may not contain all of the frames originally captured on a roll. Frames may be
missing due to errors during imagery acquisition or in scanning resulting in a defective or unusable image that has no value.
</para>
<para>
The majority of the collection has metadata imprinted on each frame including roll number, frame number, date,
and subject of interest, altitude, camera name/number, and focal length. Inclusion of imprinted metadata varies
between rolls, and may be obscured on some frames.
</para>
<para>
Metadata spreadsheets associated with the frames include date of frame capture, object of interest, roll number,
frame number, approximate coordinates, acquisition altitude and camera identification when known. Additionally,
reconstructed flight paths are provided as .kml files with each film roll.
</para>
<para>
Camera calibration reports are provided when known. The majority of the collection has calibration reports for
the cameras used to acquire the imagery, however some calibration reports have not been located do to a lack of
information regarding the cameras used on certain flights.
</para>
</description>
</methodStep>
</methods>
<project>
<title>Phase One data rescue of the Austin Post air photo collection and new repeat aerial photography of Alaskan valley glaciers</title>
<personnel>
<individualName>
<givenName>Matt</givenName>
<surName>Nolan</surName>
</individualName>
<role>principalInvestigator</role>
</personnel>
<funding>
<para>NSF 1107737</para>
</funding>
</project>
<dataTable>
<entityName>1994.csv</entityName>
<entityDescription>File contains image capture date, location name, cameral roll number, Latitude and Longitude
in decimal degrees, Altitude, media used in image capture, and relevant comments</entityDescription>
<physical id="urn:uuid:34e03d3b-4951-4243-8e53-0e72053d45e8" scope="document">
<objectName>1994.csv</objectName>
<size unit="bytes">84085</size>
<authentication method="SHA1">f1c64916087b0fc1ccf95aa26b8cb9a9232348b3</authentication>
<dataFormat>
<externallyDefinedFormat>
<formatName>text/csv</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution scope="document">
<online>
<url function="download">https://cn.dataone.org/cn/v2/resolve/urn:uuid:34e03d3b-4951-4243-8e53-0e72053d45e8</url>
</online>
</distribution>
</physical>
<attributeList>
<attribute>
<attributeName>Date</attributeName>
<attributeDefinition>Date in YYYY-MM-DD</attributeDefinition>
<measurementScale>
<dateTime>
<formatString>YYYY-MM-DD</formatString>
</dateTime>
</measurementScale>
</attribute>
<attribute>
<attributeName>Location</attributeName>
<attributeDefinition>Location Name</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Location Name</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute>
<attributeName>Roll</attributeName>
<attributeDefinition>Year, orientation, and flight number of photographic set.
The last two digits of the year, a letter representing the orientation of the camera,
and a digit representing the flight number</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Year, orientation, and roll</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute>
<attributeName>Frame</attributeName>
<attributeDefinition>Photographic frame respective to its roll</attributeDefinition>
<measurementScale>
<interval>
<unit>
<standardUnit>dimensionless</standardUnit>
</unit>
<numericDomain>
<numberType>integer</numberType>
</numericDomain>
</interval>
</measurementScale>
</attribute>
<attribute>
<attributeName>Latitude</attributeName>
<attributeDefinition>Latitude in decimal degrees</attributeDefinition>
<measurementScale>
<interval>
<unit>
<standardUnit>degree</standardUnit>
</unit>
<numericDomain>
<numberType>real</numberType>
</numericDomain>
</interval>
</measurementScale>
</attribute>
<attribute>
<attributeName>Longitude</attributeName>
<attributeDefinition>Longitude in decimal degrees</attributeDefinition>
<measurementScale>
<interval>
<unit>
<standardUnit>degree</standardUnit>
</unit>
<numericDomain>
<numberType>real</numberType>
</numericDomain>
</interval>
</measurementScale>
</attribute>
<attribute>
<attributeName>Altitude</attributeName>
<attributeDefinition>Altitude of frame capture (feet)</attributeDefinition>
<measurementScale>
<ratio>
<unit>
<standardUnit>foot</standardUnit>
</unit>
<numericDomain>
<numberType>real</numberType>
</numericDomain>
</ratio>
</measurementScale>
</attribute>
<attribute>
<attributeName>Media</attributeName>
<attributeDefinition>Type of photograph (BW/Color)</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Type of photograph (BW/Color)</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute>
<attributeName>Comments</attributeName>
<attributeDefinition>Additional relevant information</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Additional relevant information</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
</attributeList>
</dataTable>
<otherEntity id="urn:uuid:d98fa58d-c03e-451b-901c-e71c84bd2737" scope="document" system="https://tools.ietf.org/html/rfc4122">
<entityName>fileLayout.pdf</entityName>
<entityDescription>Flow chart of folder structure and the contents of each folder</entityDescription>
<physical scope="document">
<objectName>fileLayout.pdf</objectName>
<size unit="bytes">19559</size>
<authentication method="SHA-1">f054c66323372b24dd780c73b1b4b5de3c00381d</authentication>
<dataFormat>
<externallyDefinedFormat>
<formatName>application/pdf</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution scope="document">
<online>
<url function="download">https://cn.dataone.org/cn/v2/resolve/urn:uuid:d98fa58d-c03e-451b-901c-e71c84bd2737</url>
</online>
</distribution>
</physical>
<entityType>Other</entityType>
</otherEntity>
</dataset>
</eml:eml>
`;
console.log(validateXML(xmlString));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment