VladimirAlexiev/uncefact-notes.md

## uncefact-notes.md

      
    Raw
  

              uncefact-notes.md
            
          
    UNCEFACT JSONLD Notes

Notes on https://service.unece.org/trade/uncefact/vocabulary/uncefact.jsonld
Contents


Table of Contents

UNCEFACT JSONLD Notes

Contents
Intro
Stats


Potential Improvements

BIEs
Classes
Prop Descriptions
Prop Dichotomy
Duplicated Props
Document vs Line Structure
Property Datatypes
"Code" props: change to xsd:token, vs rename
Parasitic Word "Identification"
Parasitic Word "Formatted"
DateTime prop granularity (is ok)
camelCasing
Prop Name Doublons
Scope
Prop Descriptions
BinaryObject vs BinaryFile


Intro


By vladimir.alexiev@ontotext.com, 31-Jan-2022
This does not include the codelists, just marker classes for them
converted to turtle

riot --formatted=turtle uncefact.jsonld  1>uncefact.ttl

sorted paragraphs
cefact: are BIEs, uncefact: are classes and props. It's great that 2 distinct namespaces are used

Stats

Classes:


c
class


1747
rdf:Property


409
rdfs:Class


218
uncefact:AggregateBIE


1067
uncefact:AssociationBIE


1821
uncefact:BasicBIE


Terms (classes+props): 2156
BIEs: 3106

Meta props:


c
prop
comment


5262
rdf:type
terms+BIE


5262
rdfs:comment
terms+BIE


2156
rdfs:label
only terms


2351
schema:domainIncludes


1747
schema:rangeIncludes


742
uncefact:TDED
4-digit code


2888
uncefact:cefactBieDomainClass


3106
uncefact:cefactBusinessProcess


3106
uncefact:cefactElementMetadata
each BIE is related to 1 term


3106
uncefact:cefactUNId


11
uncefact:status


BIEs by Process:


bp
c


Agricultural
56


Buy-Ship-Pay
1596


Cataloguing
9


Cross Industry
3


Cross Industry Trade
23


Cross-Border
45


Customer to bank payment initiation
103


FLUX
1


In All Contexts
771


Invoicing
35


Laboratory Observation
8


MSDS Reporting
9


Pricing
8


Project Management
18


Remittance
1


Scheduling
2


Supply Chain
98


Traceability
12


Trade
189


Transport
118


e-Certificate of Origin
1


Count of props by domain and range:
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select (count(?prop) as ?props) ?domains ?ranges {
  {select ?prop (count(?domain) as ?domains) (count(?range) as ?ranges) {
    ?prop a rdf:Property.
    {?prop schema:domainIncludes ?domain}
    union {?prop schema:rangeIncludes ?range}
  } group by ?prop}
} group by ?domains ?ranges order by ?props


props
domains
ranges


1
9
1


1
10
1


1
11
1


1
13
1


1
30
1


1
41
1


1
44
1


1
72
1


2
6
1


4
8
1


5
7
1


7
5
1


19
4
1


36
3
1


157
2
1


1509
1
1


The most popular props are uncefact:typeCode (44 domains), uncefact:description (41 domains).
Each prop has exactly one range (great!)

Props by range:
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?range (count(*) as ?c) {
    ?prop schema:rangeIncludes ?range1.
    bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range
(results below)
BIE:

There are more BIE than Property+Class because some classes/props link to several BIE, so we can't simply embed the BIE metadata in terms (props/classes):

uncefact:calculatedAmount uncefact:cefactElementMetadata
  cefact:Applied_Tax.Calculated.Amount ,
  cefact:Header_BalanceOut.Calculated.Amount ,
  cefact:Trade_Tax.Calculated.Amount ,
  cefact:Payment_BalanceOut.Calculated.Amount .

There are no BIEs connected to several terms (Property/Class)

Potential Improvements

BIEs


uncefact/spec-jsonld#35
The cefact: namespace should be at unece rather than edi3, like the uncefact: namespace

@prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> .
@prefix cefact:   <https://edi3.org/cefact#> .

The cefact: namespace doesn't reflect its purpose. Maybe it should be uncefactBIE:? And a corresponding part in the URL
uncefact/spec-jsonld#36
The link from BasicBIE to AggregateBIE should be an object prop, not a string:

cefact:WorkItem_QuantityAnalysis.Details
        rdf:type                        uncefact:AggregateBIE .
cefact:WorkItem_QuantityAnalysis.Identification.Identifier
        rdf:type                        uncefact:BasicBIE ;
        uncefact:cefactBieDomainClass   "cefact:WorkItem_QuantityAnalysis.Details" # -> cefact:WorkItem_QuantityAnalysis.Details


uncefact/spec-jsonld#37
uncefact:cefactUNId "cefact:UN01002518": maybe remove the word "cefact" from the value?
uncefact/spec-jsonld#38

there should be no empty values uncefact:TDED ".": skip them
And what is TDED? use words not abbreviations


Classes


uncefact/spec-jsonld#39

The LOCODE instance data uses uncefact:UNLOCODE rather than uncefact:UNECELOCODE.
Furthermore, this class is not very meaningful semantically (see uncefact/spec-jsonld#29), which may be confirmed by its "description":


uncefact:UNECELOCODE  rdf:type  rdfs:Class ;
        rdfs:comment  "LOCODE." ;
        rdfs:label    "UNLOCODE" .
Prop Descriptions

uncefact/spec-jsonld#41
Some BIE Descriptions seem to be richer than prop descriptions. Eg
cefact:Academic_Qualification.AbbreviatedName.Text a uncefact:BasicBIE ;
  rdfs:comment "The abbreviated name, expressed as text, of this academic qualification." ;
uncefact:abbreviatedName a rdf:Property ;
  rdfs:comment "An abbreviated name, expressed as text." ;
The BIE description could be used when the BIE is used in one term, and the term is applicable to only one class (as is the case for uncefact:abbreviatedName: applicable only to uncefact:Qualification)
1242 (or 1337?) of 1747 props satisfy this condition:
prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> 
prefix schema: <http://schema.org/>
select * {
  ?prop uncefact:cefactElementMetadata ?bie
  filter not exists {?prop schema:domainIncludes ?dom1,?dom2 filter (?dom1 != ?dom2)}
  filter not exists {?prop uncefact:cefactElementMetadata ?bie2 filter (?bie != ?bie2)}
  ?prop rdfs:comment ?propDescr.
  ?bie rdfs:comment ?bieDescr.
} 

This requires further examination and unification of descriptions. Eg in the pair below:

prop Accreditation: "An official recognition awarded to a person, organisation or thing, such as a building or product, to certify that a certain level of attainment has been achieved."
BIE Certified_Accreditation.Details: "A certified recognition that provides evidence of a level of competency in a given area, such as certifying a level of skill in a trade."

Prop Dichotomy


uncefact/spec-jsonld#40
schema.org uses rdf:Property because almost all of its props allow literal (text) in addition to object. However, UNCEFACT seems to be strict in following a "property dichotomy", so use owl:ObjectProperty (797, see below) vs owl:DatatypeProperty (950).


range
c
comment


xsd:string
791
all literals


xsd:token
159
identifiers, all end in Id


uncefact:
797
uncefact classes (object props)


Duplicated Props

uncefact/spec-jsonld#42
The props in these pairs are duplicated, so should be merged

bICId: "The unique Bank Identification Code (BIC) as defined in ISO 9362 for this creditor or debtor financial institution."

bICIdentificationId: "The Bank Identifier Code (BIC) as defined by ISO 9362 (Banking telecommunication messages, Bank Identifier Codes) for this financial identity."


versionId: "An identifier of the version."

versionIdentificationId: "The unique identifier for the version of this exchanged document."


attachedBinaryFile: "A binary file attached to this exchanged or referenced document."

attachmentBinaryObject: A binary object that is attached or otherwise appended to this referenced document."


referenceDocument

referencedDocument: haven't checked description


This is according to principle 4. deduplication of #33 (tech-spec.md)
Also:

uncefact/spec-jsonld#43
Address: the breakdown lineOne, lineTwo, lineThree, lineFour is pretty random (why not 6 or 10?) and old-fashioned.
xsd:string allows multiple lines, so merge to just one prop eg addressLines

Document vs Line Structure

uncefact/spec-jsonld#44
Document has enough props to describe also DocumentLines, whcih are document parts:

lineCountNumeric: if it's a Document, count of lines in it
lineId: if it's a DocumentLine, its line ID
parentLineId: to establish a hierarchy between lines

However, there's no way to express a document hierarchy or parthood:

Which lines comprise this Document
Which lines are nested under this parent line?

This leads to confusions such as

Wrong domain? The description talks about "DocumentLine" but the domain is Document

uncefact:lineStatusReason
        rdfs:comment                    "A reason, expressed as text, for the line status in this document line." ;
        schema:domainIncludes           uncefact:Document .

uncefact:lineStatusReasonCode
        rdfs:comment                    "The code specifying the line status reason for this document line." ;
        schema:domainIncludes           uncefact:Document .

Wrong domain of lineTotalBasisAmount? Where is that "line" mentioned in the description?

Tax has 51 attributes including basisAmount so how can one distinguish between the two?
On the other hand its "sibling" prop lineTotalAmount has domain MonetarySummation


uncefact:lineTotalBasisAmount
        rdfs:comment                    "A monetary value used as the line total basis on which this trade related tax, levy or duty is calculated." ;
        schema:domainIncludes           uncefact:Tax ;
Property Datatypes

uncefact/spec-jsonld#45
Currently UNCEFACT uses only two literal datatypes: xsd:string (791 props) and xsd:token (159 props)
UNCEFACT prop names are made according to ISO/IEC 11179 Metadata Registry (MDR), part 5:2015 Naming and identification principles. The last word of prop names (let's call it "kind") suggests many other datatypes.
Surely trade involves some numbers and some dates?!?
I checked that all props with kind Id are xsd:token (good).
This query counts xsd:string props by "kind":

Count of xsd:string props by "kind" (last word of name)

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?kind (count(*) as ?c) {
  ?prop schema:rangeIncludes xsd:string
  bind(replace(str(?prop),".*([A-Z][a-z]*)","$1") as ?kind)
  filter(regex(?kind,"^[A-Z]"))
} group by ?kind order by ?kind
Count and tentative proposed changes:


kind
c
change to


"Access"
1


"Agency"
1


"Amount"
89
numeric


"Basis"
2


"Box"
1


"Charge"
1


"Code"
154
xsd:token


"Conditions"
1


"Criteria"
1


"Date"
3
xsd:date


"Description"
21


"Dimension"
1


"Five"
1


"Four"
1


"Indicator"
73
xsd:boolean


"Information"
21


"Instructions"
2


"Limit"
2


"List"
2


"Means"
1


"Measure"
66


"Name"
47


"Number"
4
numeric


"Numeric"
15
IndexNumeric, SequenceNumeric -> xsd:integer


"Object"
7


"Of"
2


"One"
1


"Pattern"
1


"Percent"
16
numeric


"Phrase"
1


"Point"
1


"Procedure"
1


"Quantity"
91
numeric


"Rate"
4


"Reason"
7


"Reference"
6


"Remark"
2


"Remarks"
1


"Restriction"
3


"Result"
1


"Status"
1


"Three"
1


"Time"
79
xsd:dateTime


"Title"
1


"Two"
1


"Type"
9


"Use"
1


"Value"
1


"Zone"
1


Examples:

Numeric candidates:
uncefact:usedToDateQuotaQuantity, uncefact:usedSignalSourceQuantity, taxBasisTotalAmount, taxBasisAllowanceRate
date or dateTime candidates:
uncefact:occurrenceDateTime
xsd:boolean candidates:
uncefact:nilCarriageValueIndicator, uncefact:nilCustomsValueIndicator, uncefact:nilInsuranceValueIndicator

uncefact/spec-jsonld#46
Decide numeric datatypes
"Code" props: change to xsd:token, vs rename

uncefact/spec-jsonld#47
Props named xxxCode come in two kinds:
prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> 
PREFIX schema: <http://schema.org/>
select ?range (count(*) as ?c) {
  ?prop schema:rangeIncludes ?range1
  filter(regex(str(?prop),"Code$"))
  bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range

xsd:string: 154. Consider mapping to range xsd:token (same as props named xxxId).
xsd:oken doesn't allow leading, consecutive and trailing spaces, so it fits better than xsd:string. Example:

accessRightsCode xsd:string -> xsd:token


Objects (codelist values): 110. Consider renaming them to remove Code (because objects are not codes!). Examples:

accountingDocumentSetTriggerCode uncefact:UNCL1001Code -> accountingDocumentSetTrigger
cross-BorderRegulatoryProcedureTypeCode uncefact:UNCL9353Code ->  cross-BorderRegulatoryProcedureType
logisticsSealSealingPartyRoleCode uncefact:UNCL9303Code -> logisticsSealSealingPartyRole


Parasitic Word "Identification"

uncefact/spec-jsonld#48

remove the parasitic word Identification from xxxIdentificationId.
Eg uncefact:versionIdentificationId, bICIdentificationId (led to duplication), etc
Note: renaming xxxIndicator to isXxx won't work well, eg transportEquipmentSplitGoodsIndicator

uncefact/spec-jsonld#49
Here are the remaining 4:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
  ?x a rdf:Property
  filter(regex(str(?x),"Identification"))
  filter(!regex(str(?x),"IdentificationId"))
}

uncefact:allowanceChargeIdentificationTypeCode "The code specifying the type of this trade allowance charge."
range UNCL5189Code "Code specifying the identification of an allowance or charge."
rename to allowanceOrChargeType
uncefact:geoCoordinateIdentificationGeographicalCoordinate range GeoCoordinate:
rename to geoCoordinate
uncefact:natureIdentificationCargo: descr sounds like it's free text "Transport cargo details of the consignment or consignment item sufficient to identify its nature for customs, statistical or transport purposes."
but in fact it has range uncefact:Cargo,
so rename to cargo

The descr of class Cargo is "Goods being transported." so none of that "sufficient to identify its nature" fuzziness?


uncefact:uNDGIdentificationCode: "United Nations Dangerous Goods (UNDG) number":
rename to undgCode

Parasitic Word "Formatted"

uncefact/spec-jsonld#52
Many dateTime props have names called "formatted".

As opposed to what, cuneiform? :-)
You should indicate the required format with rangeIncludes xsd:dateTime, not with the property name

This query finds them, together with a better-named prop when it exists:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?x ?y {
  ?x a rdf:Property
  bind(strafter(str(?x),str(uncefact:)) as ?xName)
  filter(regex(?xName,"formatted","i"))
  optional {
    ?y a rdf:Property
    bind(strafter(str(?y),str(uncefact:)) as ?yName)
    filter(?xName != ?yName && regex(?xName,?yName,"i"))
  }
}
I think all x should be simplified by removing the parasitic words "formatted", and merged with y when indicated:


x
y
note


formattedExpiryDateTime
expiryDate
maybe merge all 3 but see below


formattedExpiryDateTime
expiryDateTime


formattedFormattedCancellationAnnouncedLaunchDateTime


formattedFormattedIssueDateTime
issueDateTime


formattedFormattedLatestProductDataChangeDateTime


formattedFormattedPickUpAvailabilityDateTime
formattedPickUpAvailabilityDateTime
merge & rename to pickUpAvailabilityDateTime


formattedPickUpAvailabilityDateTime


formattedFormattedReceivedDateTime
receivedDateTime


formattedFormattedUltimateShipToDeliveryDateTime
ultimateShipToDeliveryDateTime


formattedJurisdictionEntryDateTime


formattedLastRegisteredYearDateTime
lastRegisteredYearDateTime


formattedObtainedDateTime


formattedScheduledArrivalRelatedDateTime
arrivalRelatedDateTime


formattedScheduledArrivalRelatedDateTime
scheduledArrivalRelatedDateTime


formattedScheduledDepartureRelatedDateTime
departureRelatedDateTime


formattedScheduledDepartureRelatedDateTime
scheduledDepartureRelatedDateTime


This puppy is really messed up:

formattedExpiryDateTime "The date, time, date time or other date time value when this certified accreditation expires."
but range is UNCL2379Code "Code specifying the representation of a date, time or period."
So should this be expiryDateTime range xsd:dateTime, or expiryDateTimeFormat range UNCL2379Code???

(Ooook, most of them are in this sort of situation)
DateTime prop granularity (is ok)


expiryDateTime "A date, time, date time, or other date time value of expiry."
expiryDate "The date of expiry up to which this trade settlement financial card is valid."

These differ only by granularity of datatype.
But it's ok to have 2 props since financial card expiry never has a time.
camelCasing

uncefact/spec-jsonld#50
You use consistent camelCasing for props, and UpperCamelCasing for classes (good!).
However, it needs to be made smarter when dealing with UPPERCASE:
UPPERCASE abbreviations should be converted to lowercase, then camelCased as a normal word

otherwise:

casing is inconsistent depending on whether the abbreviation comes at the start or middle of the property name
The camelized abbreviation is impossible to recognize in the stream of words


examples:

current: bBANIdentificationId, bICId, australianSNIdentificationId (wtf is BANI, ICI, SNI?)
change to: bbanId, bicId, australianSnId
or even better: bban, bic, australianSn


Haven't looked for class names. Dunno how to catch all cases.
Prop Name Doublons

uncefact/spec-jsonld#51

duplicated word: formattedFormattedCancellationAnnouncedLaunchDateTime (and 4 more starting the same way)
duplicated word: referenceReferenceTypeCode
duplicated phrase: documentLineDocumentLineStatusCode

Scope

uncefact/spec-jsonld#54

It would be better to defer to OGC GeoSPARQL clases since OGC is a stronger authority on geographic information than UNCEFACT:
uncefact:Circle , uncefact:GeographicalMultiCurve , uncefact:GeographicalPoint , uncefact:GeographicalMultiSurface , uncefact:GeographicalGrid , uncefact:GeographicalMultiPoint , uncefact:Polygon , uncefact:LinearRing , uncefact:GeographicalLine , uncefact:GeographicalSurface
Much expanded info

Prop Descriptions

"An identification of a set of geographical coordinates for this trade address."
BinaryObject vs BinaryFile

uncefact/spec-jsonld#53
This query finds props named "binary":
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select * {
  ?x a rdf:Property
  bind(strafter(str(?x),str(uncefact:)) as ?xName)
  filter(regex(?xName,"binary","i"))
}
There are 15, nearly equally spread between "BinaryObject" and "BinaryFile":
"attachedBinaryFile"
"attachmentBinaryObject"
"creationBinaryFile"
"descriptionBinaryObject"
"imageBinaryObject"
"includedBinaryObject"
"logoAssociatedBinaryFile"
"mapBinaryObject"
"presentationBinaryFile"
"readerBinaryFile"
"referenceFileBinaryObject"
"referencedBinaryFile"
"relatedBinaryFile"
"signatoryImageBinaryObject"
"valueBinaryFile"

Standardize on one of the names; this discrepancy has caused 2 prop duplications (attached, referenced)
c	class
1747	rdf:Property
409	rdfs:Class
218	uncefact:AggregateBIE
1067	uncefact:AssociationBIE
1821	uncefact:BasicBIE
c	prop	comment
5262	rdf:type	terms+BIE
5262	rdfs:comment	terms+BIE
2156	rdfs:label	only terms
2351	schema:domainIncludes
1747	schema:rangeIncludes
742	uncefact:TDED	4-digit code
2888	uncefact:cefactBieDomainClass
3106	uncefact:cefactBusinessProcess
3106	uncefact:cefactElementMetadata	each BIE is related to 1 term
3106	uncefact:cefactUNId
11	uncefact:status
bp	c
Agricultural	56
Buy-Ship-Pay	1596
Cataloguing	9
Cross Industry	3
Cross Industry Trade	23
Cross-Border	45
Customer to bank payment initiation	103
FLUX	1
In All Contexts	771
Invoicing	35
Laboratory Observation	8
MSDS Reporting	9
Pricing	8
Project Management	18
Remittance	1
Scheduling	2
Supply Chain	98
Traceability	12
Trade	189
Transport	118
e-Certificate of Origin	1
props	domains	ranges
1	9	1
1	10	1
1	11	1
1	13	1
1	30	1
1	41	1
1	44	1
1	72	1
2	6	1
4	8	1
5	7	1
7	5	1
19	4	1
36	3	1
157	2	1
1509	1	1
range	c	comment
xsd:string	791	all literals
xsd:token	159	identifiers, all end in `Id`
uncefact:	797	uncefact classes (object props)
kind	c	change to
"Access"	1
"Agency"	1
"Amount"	89	numeric
"Basis"	2
"Box"	1
"Charge"	1
"Code"	154	`xsd:token`
"Conditions"	1
"Criteria"	1
"Date"	3	`xsd:date`
"Description"	21
"Dimension"	1
"Five"	1
"Four"	1
"Indicator"	73	`xsd:boolean`
"Information"	21
"Instructions"	2
"Limit"	2
"List"	2
"Means"	1
"Measure"	66
"Name"	47
"Number"	4	numeric
"Numeric"	15	IndexNumeric, SequenceNumeric -> `xsd:integer`
"Object"	7
"Of"	2
"One"	1
"Pattern"	1
"Percent"	16	numeric
"Phrase"	1
"Point"	1
"Procedure"	1
"Quantity"	91	numeric
"Rate"	4
"Reason"	7
"Reference"	6
"Remark"	2
"Remarks"	1
"Restriction"	3
"Result"	1
"Status"	1
"Three"	1
"Time"	79	`xsd:dateTime`
"Title"	1
"Two"	1
"Type"	9
"Use"	1
"Value"	1
"Zone"	1
x	y	note
formattedExpiryDateTime	expiryDate	maybe merge all 3 but see below
formattedExpiryDateTime	expiryDateTime
formattedFormattedCancellationAnnouncedLaunchDateTime
formattedFormattedIssueDateTime	issueDateTime
formattedFormattedLatestProductDataChangeDateTime
formattedFormattedPickUpAvailabilityDateTime	formattedPickUpAvailabilityDateTime	merge & rename to `pickUpAvailabilityDateTime`
formattedPickUpAvailabilityDateTime
formattedFormattedReceivedDateTime	receivedDateTime
formattedFormattedUltimateShipToDeliveryDateTime	ultimateShipToDeliveryDateTime
formattedJurisdictionEntryDateTime
formattedLastRegisteredYearDateTime	lastRegisteredYearDateTime
formattedObtainedDateTime
formattedScheduledArrivalRelatedDateTime	arrivalRelatedDateTime
formattedScheduledArrivalRelatedDateTime	scheduledArrivalRelatedDateTime
formattedScheduledDepartureRelatedDateTime	departureRelatedDateTime
formattedScheduledDepartureRelatedDateTime	scheduledDepartureRelatedDateTime