Created
December 19, 2017 16:36
-
-
Save ghukill/da0da0d1a9d6291ef59f95a664faf0b9 to your computer and use it in GitHub Desktop.
Example python record validation functions for Combine
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
You can import most any python library that you'd like up here, and then use within checking functions | |
''' | |
import re | |
''' | |
Each function is its own "test" when validating against a record. | |
Each function is provided with a single argument called "record" which has the following attributes: | |
- record.xml = record document parsed as etree xml node | |
- record.document = record document as a string (might be handy for regex tests) | |
- record.record_id = unique identifier | |
- record._row = actual row from MySQL | |
and more! | |
Each function has one goal: | |
- either return True if record passes all tests | |
- or, return a string that will be sent to the front-end for reporting, something like "bad dateIssued formatting" | |
Currently, all functions are run, but this may change and adopt conventions like the function must begin with "test_", | |
e.g. def test_check_for_mods_titleInfo(). This would allow other helpers functions to be present, but not run as tests. | |
This file would be dumped wholesale into Combine as a "Validation Scenario", and is then parsed and used during a job | |
validation phase. | |
''' | |
def check_for_mods_titleInfo(record): | |
''' | |
This test looks for any amount of <mods:titleInfo> elements. | |
- notice the use of `record.nsmap`, which is a dictionary of namespaces that can be useful when using etree | |
to perform xpath queries | |
- notice the message it returns if none are found, or True if they are | |
''' | |
titleInfo_elements = record.xml.xpath('//mods:titleInfo', namespaces=record.nsmap) | |
if len(titleInfo_elements) > 0: | |
return True | |
else: | |
return "No mods:titleInfo elements found" | |
def check_dateIssued_format(record): | |
''' | |
This function checks for a particular format from mods:dateIsssud. | |
- notice this uses the regex `re` python library, which is imported at the top of this file | |
''' | |
# get dateIssued elements | |
dateIssued_elements = record.xml.xpath('//mods:dateIssued', namespaces=record.nsmap) | |
# if found, check format | |
if len(dateIssued_elements) > 0: | |
# loop through values and check | |
for dateIssued in dateIssued_elements: | |
# check format | |
if dateIssued.text is not None: | |
match = re.match(r'^[0-9]{4}-[0-9]{2}-[0-9]{2}$', dateIssued.text) | |
else: | |
return "dateIssued value is None" | |
# match found, continue | |
if match: | |
continue | |
else: | |
return "mods:dataIssued bad formatting" | |
# if all matches, return True | |
return True | |
# if none found, return True indicating passed test due to omission | |
else: | |
return True |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment