Skip to content

Instantly share code, notes, and snippets.

@VladimirAlexiev
Last active February 1, 2022 13:30
Show Gist options
  • Save VladimirAlexiev/618a9bddd6a949b75b37e983f0220417 to your computer and use it in GitHub Desktop.
Save VladimirAlexiev/618a9bddd6a949b75b37e983f0220417 to your computer and use it in GitHub Desktop.

UNCEFACT JSONLD Notes

Notes on https://service.unece.org/trade/uncefact/vocabulary/uncefact.jsonld

Contents

Table of Contents

Intro

riot --formatted=turtle uncefact.jsonld  1>uncefact.ttl
  • sorted paragraphs
  • cefact: are BIEs, uncefact: are classes and props. It's great that 2 distinct namespaces are used

Stats

Classes:

c class
1747 rdf:Property
409 rdfs:Class
218 uncefact:AggregateBIE
1067 uncefact:AssociationBIE
1821 uncefact:BasicBIE
  • Terms (classes+props): 2156
  • BIEs: 3106

Meta props:

c prop comment
5262 rdf:type terms+BIE
5262 rdfs:comment terms+BIE
2156 rdfs:label only terms
2351 schema:domainIncludes
1747 schema:rangeIncludes
742 uncefact:TDED 4-digit code
2888 uncefact:cefactBieDomainClass
3106 uncefact:cefactBusinessProcess
3106 uncefact:cefactElementMetadata each BIE is related to 1 term
3106 uncefact:cefactUNId
11 uncefact:status

BIEs by Process:

bp c
Agricultural 56
Buy-Ship-Pay 1596
Cataloguing 9
Cross Industry 3
Cross Industry Trade 23
Cross-Border 45
Customer to bank payment initiation 103
FLUX 1
In All Contexts 771
Invoicing 35
Laboratory Observation 8
MSDS Reporting 9
Pricing 8
Project Management 18
Remittance 1
Scheduling 2
Supply Chain 98
Traceability 12
Trade 189
Transport 118
e-Certificate of Origin 1

Count of props by domain and range:

PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select (count(?prop) as ?props) ?domains ?ranges {
  {select ?prop (count(?domain) as ?domains) (count(?range) as ?ranges) {
    ?prop a rdf:Property.
    {?prop schema:domainIncludes ?domain}
    union {?prop schema:rangeIncludes ?range}
  } group by ?prop}
} group by ?domains ?ranges order by ?props
props domains ranges
1 9 1
1 10 1
1 11 1
1 13 1
1 30 1
1 41 1
1 44 1
1 72 1
2 6 1
4 8 1
5 7 1
7 5 1
19 4 1
36 3 1
157 2 1
1509 1 1
  • The most popular props are uncefact:typeCode (44 domains), uncefact:description (41 domains).
  • Each prop has exactly one range (great!)

Props by range:

PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?range (count(*) as ?c) {
    ?prop schema:rangeIncludes ?range1.
    bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range

(results below)

BIE:

  • There are more BIE than Property+Class because some classes/props link to several BIE, so we can't simply embed the BIE metadata in terms (props/classes):
uncefact:calculatedAmount uncefact:cefactElementMetadata
  cefact:Applied_Tax.Calculated.Amount ,
  cefact:Header_BalanceOut.Calculated.Amount ,
  cefact:Trade_Tax.Calculated.Amount ,
  cefact:Payment_BalanceOut.Calculated.Amount .
  • There are no BIEs connected to several terms (Property/Class)

Potential Improvements

BIEs

@prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> .
@prefix cefact:   <https://edi3.org/cefact#> .
  • The cefact: namespace doesn't reflect its purpose. Maybe it should be uncefactBIE:? And a corresponding part in the URL
  • uncefact/spec-jsonld#36 The link from BasicBIE to AggregateBIE should be an object prop, not a string:
cefact:WorkItem_QuantityAnalysis.Details
        rdf:type                        uncefact:AggregateBIE .
cefact:WorkItem_QuantityAnalysis.Identification.Identifier
        rdf:type                        uncefact:BasicBIE ;
        uncefact:cefactBieDomainClass   "cefact:WorkItem_QuantityAnalysis.Details" # -> cefact:WorkItem_QuantityAnalysis.Details
  • uncefact/spec-jsonld#37 uncefact:cefactUNId "cefact:UN01002518": maybe remove the word "cefact" from the value?
  • uncefact/spec-jsonld#38
    • there should be no empty values uncefact:TDED ".": skip them
    • And what is TDED? use words not abbreviations

Classes

  • uncefact/spec-jsonld#39
    • The LOCODE instance data uses uncefact:UNLOCODE rather than uncefact:UNECELOCODE.
    • Furthermore, this class is not very meaningful semantically (see uncefact/spec-jsonld#29), which may be confirmed by its "description":
uncefact:UNECELOCODE  rdf:type  rdfs:Class ;
        rdfs:comment  "LOCODE." ;
        rdfs:label    "UNLOCODE" .

Prop Descriptions

uncefact/spec-jsonld#41

Some BIE Descriptions seem to be richer than prop descriptions. Eg

cefact:Academic_Qualification.AbbreviatedName.Text a uncefact:BasicBIE ;
  rdfs:comment "The abbreviated name, expressed as text, of this academic qualification." ;
uncefact:abbreviatedName a rdf:Property ;
  rdfs:comment "An abbreviated name, expressed as text." ;

The BIE description could be used when the BIE is used in one term, and the term is applicable to only one class (as is the case for uncefact:abbreviatedName: applicable only to uncefact:Qualification) 1242 (or 1337?) of 1747 props satisfy this condition:

prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> 
prefix schema: <http://schema.org/>
select * {
  ?prop uncefact:cefactElementMetadata ?bie
  filter not exists {?prop schema:domainIncludes ?dom1,?dom2 filter (?dom1 != ?dom2)}
  filter not exists {?prop uncefact:cefactElementMetadata ?bie2 filter (?bie != ?bie2)}
  ?prop rdfs:comment ?propDescr.
  ?bie rdfs:comment ?bieDescr.
} 

This requires further examination and unification of descriptions. Eg in the pair below:

  • prop Accreditation: "An official recognition awarded to a person, organisation or thing, such as a building or product, to certify that a certain level of attainment has been achieved."
  • BIE Certified_Accreditation.Details: "A certified recognition that provides evidence of a level of competency in a given area, such as certifying a level of skill in a trade."

Prop Dichotomy

  • uncefact/spec-jsonld#40 schema.org uses rdf:Property because almost all of its props allow literal (text) in addition to object. However, UNCEFACT seems to be strict in following a "property dichotomy", so use owl:ObjectProperty (797, see below) vs owl:DatatypeProperty (950).
range c comment
xsd:string 791 all literals
xsd:token 159 identifiers, all end in Id
uncefact: 797 uncefact classes (object props)

Duplicated Props

uncefact/spec-jsonld#42 The props in these pairs are duplicated, so should be merged

  • bICId: "The unique Bank Identification Code (BIC) as defined in ISO 9362 for this creditor or debtor financial institution."
    • bICIdentificationId: "The Bank Identifier Code (BIC) as defined by ISO 9362 (Banking telecommunication messages, Bank Identifier Codes) for this financial identity."
  • versionId: "An identifier of the version."
    • versionIdentificationId: "The unique identifier for the version of this exchanged document."
  • attachedBinaryFile: "A binary file attached to this exchanged or referenced document."
    • attachmentBinaryObject: A binary object that is attached or otherwise appended to this referenced document."
  • referenceDocument
    • referencedDocument: haven't checked description

This is according to principle 4. deduplication of #33 (tech-spec.md)

Also:

  • uncefact/spec-jsonld#43 Address: the breakdown lineOne, lineTwo, lineThree, lineFour is pretty random (why not 6 or 10?) and old-fashioned. xsd:string allows multiple lines, so merge to just one prop eg addressLines

Document vs Line Structure

uncefact/spec-jsonld#44

Document has enough props to describe also DocumentLines, whcih are document parts:

  • lineCountNumeric: if it's a Document, count of lines in it
  • lineId: if it's a DocumentLine, its line ID
  • parentLineId: to establish a hierarchy between lines

However, there's no way to express a document hierarchy or parthood:

  • Which lines comprise this Document
  • Which lines are nested under this parent line?

This leads to confusions such as

  • Wrong domain? The description talks about "DocumentLine" but the domain is Document
uncefact:lineStatusReason
        rdfs:comment                    "A reason, expressed as text, for the line status in this document line." ;
        schema:domainIncludes           uncefact:Document .

uncefact:lineStatusReasonCode
        rdfs:comment                    "The code specifying the line status reason for this document line." ;
        schema:domainIncludes           uncefact:Document .
  • Wrong domain of lineTotalBasisAmount? Where is that "line" mentioned in the description?
    • Tax has 51 attributes including basisAmount so how can one distinguish between the two?
    • On the other hand its "sibling" prop lineTotalAmount has domain MonetarySummation
uncefact:lineTotalBasisAmount
        rdfs:comment                    "A monetary value used as the line total basis on which this trade related tax, levy or duty is calculated." ;
        schema:domainIncludes           uncefact:Tax ;

Property Datatypes

uncefact/spec-jsonld#45

Currently UNCEFACT uses only two literal datatypes: xsd:string (791 props) and xsd:token (159 props)

UNCEFACT prop names are made according to ISO/IEC 11179 Metadata Registry (MDR), part 5:2015 Naming and identification principles. The last word of prop names (let's call it "kind") suggests many other datatypes.

Surely trade involves some numbers and some dates?!?

I checked that all props with kind Id are xsd:token (good). This query counts xsd:string props by "kind":

  • Count of xsd:string props by "kind" (last word of name)
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?kind (count(*) as ?c) {
  ?prop schema:rangeIncludes xsd:string
  bind(replace(str(?prop),".*([A-Z][a-z]*)","$1") as ?kind)
  filter(regex(?kind,"^[A-Z]"))
} group by ?kind order by ?kind

Count and tentative proposed changes:

kind c change to
"Access" 1
"Agency" 1
"Amount" 89 numeric
"Basis" 2
"Box" 1
"Charge" 1
"Code" 154 xsd:token
"Conditions" 1
"Criteria" 1
"Date" 3 xsd:date
"Description" 21
"Dimension" 1
"Five" 1
"Four" 1
"Indicator" 73 xsd:boolean
"Information" 21
"Instructions" 2
"Limit" 2
"List" 2
"Means" 1
"Measure" 66
"Name" 47
"Number" 4 numeric
"Numeric" 15 IndexNumeric, SequenceNumeric -> xsd:integer
"Object" 7
"Of" 2
"One" 1
"Pattern" 1
"Percent" 16 numeric
"Phrase" 1
"Point" 1
"Procedure" 1
"Quantity" 91 numeric
"Rate" 4
"Reason" 7
"Reference" 6
"Remark" 2
"Remarks" 1
"Restriction" 3
"Result" 1
"Status" 1
"Three" 1
"Time" 79 xsd:dateTime
"Title" 1
"Two" 1
"Type" 9
"Use" 1
"Value" 1
"Zone" 1

Examples:

  • Numeric candidates: uncefact:usedToDateQuotaQuantity, uncefact:usedSignalSourceQuantity, taxBasisTotalAmount, taxBasisAllowanceRate
  • date or dateTime candidates: uncefact:occurrenceDateTime
  • xsd:boolean candidates: uncefact:nilCarriageValueIndicator, uncefact:nilCustomsValueIndicator, uncefact:nilInsuranceValueIndicator

uncefact/spec-jsonld#46 Decide numeric datatypes

"Code" props: change to xsd:token, vs rename

uncefact/spec-jsonld#47 Props named xxxCode come in two kinds:

prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> 
PREFIX schema: <http://schema.org/>
select ?range (count(*) as ?c) {
  ?prop schema:rangeIncludes ?range1
  filter(regex(str(?prop),"Code$"))
  bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range
  • xsd:string: 154. Consider mapping to range xsd:token (same as props named xxxId). xsd:oken doesn't allow leading, consecutive and trailing spaces, so it fits better than xsd:string. Example:
    • accessRightsCode xsd:string -> xsd:token
  • Objects (codelist values): 110. Consider renaming them to remove Code (because objects are not codes!). Examples:
    • accountingDocumentSetTriggerCode uncefact:UNCL1001Code -> accountingDocumentSetTrigger
    • cross-BorderRegulatoryProcedureTypeCode uncefact:UNCL9353Code -> cross-BorderRegulatoryProcedureType
    • logisticsSealSealingPartyRoleCode uncefact:UNCL9303Code -> logisticsSealSealingPartyRole

Parasitic Word "Identification"

uncefact/spec-jsonld#48

  • remove the parasitic word Identification from xxxIdentificationId. Eg uncefact:versionIdentificationId, bICIdentificationId (led to duplication), etc
  • Note: renaming xxxIndicator to isXxx won't work well, eg transportEquipmentSplitGoodsIndicator

uncefact/spec-jsonld#49 Here are the remaining 4:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
  ?x a rdf:Property
  filter(regex(str(?x),"Identification"))
  filter(!regex(str(?x),"IdentificationId"))
}
  • uncefact:allowanceChargeIdentificationTypeCode "The code specifying the type of this trade allowance charge." range UNCL5189Code "Code specifying the identification of an allowance or charge." rename to allowanceOrChargeType
  • uncefact:geoCoordinateIdentificationGeographicalCoordinate range GeoCoordinate: rename to geoCoordinate
  • uncefact:natureIdentificationCargo: descr sounds like it's free text "Transport cargo details of the consignment or consignment item sufficient to identify its nature for customs, statistical or transport purposes." but in fact it has range uncefact:Cargo, so rename to cargo
    • The descr of class Cargo is "Goods being transported." so none of that "sufficient to identify its nature" fuzziness?
  • uncefact:uNDGIdentificationCode: "United Nations Dangerous Goods (UNDG) number": rename to undgCode

Parasitic Word "Formatted"

uncefact/spec-jsonld#52

Many dateTime props have names called "formatted".

  • As opposed to what, cuneiform? :-)
  • You should indicate the required format with rangeIncludes xsd:dateTime, not with the property name

This query finds them, together with a better-named prop when it exists:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?x ?y {
  ?x a rdf:Property
  bind(strafter(str(?x),str(uncefact:)) as ?xName)
  filter(regex(?xName,"formatted","i"))
  optional {
    ?y a rdf:Property
    bind(strafter(str(?y),str(uncefact:)) as ?yName)
    filter(?xName != ?yName && regex(?xName,?yName,"i"))
  }
}

I think all x should be simplified by removing the parasitic words "formatted", and merged with y when indicated:

x y note
formattedExpiryDateTime expiryDate maybe merge all 3 but see below
formattedExpiryDateTime expiryDateTime
formattedFormattedCancellationAnnouncedLaunchDateTime
formattedFormattedIssueDateTime issueDateTime
formattedFormattedLatestProductDataChangeDateTime
formattedFormattedPickUpAvailabilityDateTime formattedPickUpAvailabilityDateTime merge & rename to pickUpAvailabilityDateTime
formattedPickUpAvailabilityDateTime
formattedFormattedReceivedDateTime receivedDateTime
formattedFormattedUltimateShipToDeliveryDateTime ultimateShipToDeliveryDateTime
formattedJurisdictionEntryDateTime
formattedLastRegisteredYearDateTime lastRegisteredYearDateTime
formattedObtainedDateTime
formattedScheduledArrivalRelatedDateTime arrivalRelatedDateTime
formattedScheduledArrivalRelatedDateTime scheduledArrivalRelatedDateTime
formattedScheduledDepartureRelatedDateTime departureRelatedDateTime
formattedScheduledDepartureRelatedDateTime scheduledDepartureRelatedDateTime

This puppy is really messed up:

  • formattedExpiryDateTime "The date, time, date time or other date time value when this certified accreditation expires."
  • but range is UNCL2379Code "Code specifying the representation of a date, time or period."
  • So should this be expiryDateTime range xsd:dateTime, or expiryDateTimeFormat range UNCL2379Code???

(Ooook, most of them are in this sort of situation)

DateTime prop granularity (is ok)

  • expiryDateTime "A date, time, date time, or other date time value of expiry."
  • expiryDate "The date of expiry up to which this trade settlement financial card is valid."

These differ only by granularity of datatype. But it's ok to have 2 props since financial card expiry never has a time.

camelCasing

uncefact/spec-jsonld#50

You use consistent camelCasing for props, and UpperCamelCasing for classes (good!). However, it needs to be made smarter when dealing with UPPERCASE:

UPPERCASE abbreviations should be converted to lowercase, then camelCased as a normal word

  • otherwise:
    • casing is inconsistent depending on whether the abbreviation comes at the start or middle of the property name
    • The camelized abbreviation is impossible to recognize in the stream of words
  • examples:
    • current: bBANIdentificationId, bICId, australianSNIdentificationId (wtf is BANI, ICI, SNI?)
    • change to: bbanId, bicId, australianSnId
    • or even better: bban, bic, australianSn

Haven't looked for class names. Dunno how to catch all cases.

Prop Name Doublons

uncefact/spec-jsonld#51

  • duplicated word: formattedFormattedCancellationAnnouncedLaunchDateTime (and 4 more starting the same way)
  • duplicated word: referenceReferenceTypeCode
  • duplicated phrase: documentLineDocumentLineStatusCode

Scope

uncefact/spec-jsonld#54

  • It would be better to defer to OGC GeoSPARQL clases since OGC is a stronger authority on geographic information than UNCEFACT: uncefact:Circle , uncefact:GeographicalMultiCurve , uncefact:GeographicalPoint , uncefact:GeographicalMultiSurface , uncefact:GeographicalGrid , uncefact:GeographicalMultiPoint , uncefact:Polygon , uncefact:LinearRing , uncefact:GeographicalLine , uncefact:GeographicalSurface
  • Much expanded info

Prop Descriptions

"An identification of a set of geographical coordinates for this trade address."

BinaryObject vs BinaryFile

uncefact/spec-jsonld#53

This query finds props named "binary":

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select * {
  ?x a rdf:Property
  bind(strafter(str(?x),str(uncefact:)) as ?xName)
  filter(regex(?xName,"binary","i"))
}

There are 15, nearly equally spread between "BinaryObject" and "BinaryFile":

"attachedBinaryFile"
"attachmentBinaryObject"
"creationBinaryFile"
"descriptionBinaryObject"
"imageBinaryObject"
"includedBinaryObject"
"logoAssociatedBinaryFile"
"mapBinaryObject"
"presentationBinaryFile"
"readerBinaryFile"
"referenceFileBinaryObject"
"referencedBinaryFile"
"relatedBinaryFile"
"signatoryImageBinaryObject"
"valueBinaryFile"

Standardize on one of the names; this discrepancy has caused 2 prop duplications (attached, referenced)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment