Skip to content

Instantly share code, notes, and snippets.

@ghukill
Created December 19, 2017 16:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ghukill/da0da0d1a9d6291ef59f95a664faf0b9 to your computer and use it in GitHub Desktop.
Save ghukill/da0da0d1a9d6291ef59f95a664faf0b9 to your computer and use it in GitHub Desktop.
Example python record validation functions for Combine
'''
You can import most any python library that you'd like up here, and then use within checking functions
'''
import re
'''
Each function is its own "test" when validating against a record.
Each function is provided with a single argument called "record" which has the following attributes:
- record.xml = record document parsed as etree xml node
- record.document = record document as a string (might be handy for regex tests)
- record.record_id = unique identifier
- record._row = actual row from MySQL
and more!
Each function has one goal:
- either return True if record passes all tests
- or, return a string that will be sent to the front-end for reporting, something like "bad dateIssued formatting"
Currently, all functions are run, but this may change and adopt conventions like the function must begin with "test_",
e.g. def test_check_for_mods_titleInfo(). This would allow other helpers functions to be present, but not run as tests.
This file would be dumped wholesale into Combine as a "Validation Scenario", and is then parsed and used during a job
validation phase.
'''
def check_for_mods_titleInfo(record):
'''
This test looks for any amount of <mods:titleInfo> elements.
- notice the use of `record.nsmap`, which is a dictionary of namespaces that can be useful when using etree
to perform xpath queries
- notice the message it returns if none are found, or True if they are
'''
titleInfo_elements = record.xml.xpath('//mods:titleInfo', namespaces=record.nsmap)
if len(titleInfo_elements) > 0:
return True
else:
return "No mods:titleInfo elements found"
def check_dateIssued_format(record):
'''
This function checks for a particular format from mods:dateIsssud.
- notice this uses the regex `re` python library, which is imported at the top of this file
'''
# get dateIssued elements
dateIssued_elements = record.xml.xpath('//mods:dateIssued', namespaces=record.nsmap)
# if found, check format
if len(dateIssued_elements) > 0:
# loop through values and check
for dateIssued in dateIssued_elements:
# check format
if dateIssued.text is not None:
match = re.match(r'^[0-9]{4}-[0-9]{2}-[0-9]{2}$', dateIssued.text)
else:
return "dateIssued value is None"
# match found, continue
if match:
continue
else:
return "mods:dataIssued bad formatting"
# if all matches, return True
return True
# if none found, return True indicating passed test due to omission
else:
return True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment