Notes on https://service.unece.org/trade/uncefact/vocabulary/uncefact.jsonld
Table of Contents
- UNCEFACT JSONLD Notes
- Potential Improvements
- BIEs
- Classes
- Prop Descriptions
- Prop Dichotomy
- Duplicated Props
- Document vs Line Structure
- Property Datatypes
- "Code" props: change to xsd:token, vs rename
- Parasitic Word "Identification"
- Parasitic Word "Formatted"
- DateTime prop granularity (is ok)
- camelCasing
- Prop Name Doublons
- Scope
- Prop Descriptions
- BinaryObject vs BinaryFile
- By vladimir.alexiev@ontotext.com, 31-Jan-2022
- This does not include the codelists, just marker classes for them
- converted to turtle
riot --formatted=turtle uncefact.jsonld 1>uncefact.ttl
- sorted paragraphs
cefact:
are BIEs,uncefact:
are classes and props. It's great that 2 distinct namespaces are used
Classes:
c | class |
---|---|
1747 | rdf:Property |
409 | rdfs:Class |
218 | uncefact:AggregateBIE |
1067 | uncefact:AssociationBIE |
1821 | uncefact:BasicBIE |
- Terms (classes+props): 2156
- BIEs: 3106
Meta props:
c | prop | comment |
---|---|---|
5262 | rdf:type | terms+BIE |
5262 | rdfs:comment | terms+BIE |
2156 | rdfs:label | only terms |
2351 | schema:domainIncludes | |
1747 | schema:rangeIncludes | |
742 | uncefact:TDED | 4-digit code |
2888 | uncefact:cefactBieDomainClass | |
3106 | uncefact:cefactBusinessProcess | |
3106 | uncefact:cefactElementMetadata | each BIE is related to 1 term |
3106 | uncefact:cefactUNId | |
11 | uncefact:status |
BIEs by Process:
bp | c |
---|---|
Agricultural | 56 |
Buy-Ship-Pay | 1596 |
Cataloguing | 9 |
Cross Industry | 3 |
Cross Industry Trade | 23 |
Cross-Border | 45 |
Customer to bank payment initiation | 103 |
FLUX | 1 |
In All Contexts | 771 |
Invoicing | 35 |
Laboratory Observation | 8 |
MSDS Reporting | 9 |
Pricing | 8 |
Project Management | 18 |
Remittance | 1 |
Scheduling | 2 |
Supply Chain | 98 |
Traceability | 12 |
Trade | 189 |
Transport | 118 |
e-Certificate of Origin | 1 |
Count of props by domain and range:
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select (count(?prop) as ?props) ?domains ?ranges {
{select ?prop (count(?domain) as ?domains) (count(?range) as ?ranges) {
?prop a rdf:Property.
{?prop schema:domainIncludes ?domain}
union {?prop schema:rangeIncludes ?range}
} group by ?prop}
} group by ?domains ?ranges order by ?props
props | domains | ranges |
---|---|---|
1 | 9 | 1 |
1 | 10 | 1 |
1 | 11 | 1 |
1 | 13 | 1 |
1 | 30 | 1 |
1 | 41 | 1 |
1 | 44 | 1 |
1 | 72 | 1 |
2 | 6 | 1 |
4 | 8 | 1 |
5 | 7 | 1 |
7 | 5 | 1 |
19 | 4 | 1 |
36 | 3 | 1 |
157 | 2 | 1 |
1509 | 1 | 1 |
- The most popular props are
uncefact:typeCode
(44 domains),uncefact:description
(41 domains). - Each prop has exactly one range (great!)
Props by range:
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?range (count(*) as ?c) {
?prop schema:rangeIncludes ?range1.
bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range
(results below)
BIE:
- There are more BIE than Property+Class because some classes/props link to several BIE, so we can't simply embed the BIE metadata in terms (props/classes):
uncefact:calculatedAmount uncefact:cefactElementMetadata
cefact:Applied_Tax.Calculated.Amount ,
cefact:Header_BalanceOut.Calculated.Amount ,
cefact:Trade_Tax.Calculated.Amount ,
cefact:Payment_BalanceOut.Calculated.Amount .
- There are no BIEs connected to several terms (Property/Class)
- uncefact/spec-jsonld#35
The
cefact:
namespace should be atunece
rather thanedi3
, like theuncefact:
namespace
@prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#> .
@prefix cefact: <https://edi3.org/cefact#> .
- The
cefact:
namespace doesn't reflect its purpose. Maybe it should beuncefactBIE:
? And a corresponding part in the URL - uncefact/spec-jsonld#36
The link from
BasicBIE
toAggregateBIE
should be an object prop, not a string:
cefact:WorkItem_QuantityAnalysis.Details
rdf:type uncefact:AggregateBIE .
cefact:WorkItem_QuantityAnalysis.Identification.Identifier
rdf:type uncefact:BasicBIE ;
uncefact:cefactBieDomainClass "cefact:WorkItem_QuantityAnalysis.Details" # -> cefact:WorkItem_QuantityAnalysis.Details
- uncefact/spec-jsonld#37
uncefact:cefactUNId "cefact:UN01002518"
: maybe remove the word "cefact" from the value? - uncefact/spec-jsonld#38
- there should be no empty values
uncefact:TDED "."
: skip them - And what is TDED? use words not abbreviations
- there should be no empty values
- uncefact/spec-jsonld#39
- The LOCODE instance data uses
uncefact:UNLOCODE
rather thanuncefact:UNECELOCODE
. - Furthermore, this class is not very meaningful semantically (see uncefact/spec-jsonld#29), which may be confirmed by its "description":
- The LOCODE instance data uses
uncefact:UNECELOCODE rdf:type rdfs:Class ;
rdfs:comment "LOCODE." ;
rdfs:label "UNLOCODE" .
Some BIE Descriptions seem to be richer than prop descriptions. Eg
cefact:Academic_Qualification.AbbreviatedName.Text a uncefact:BasicBIE ;
rdfs:comment "The abbreviated name, expressed as text, of this academic qualification." ;
uncefact:abbreviatedName a rdf:Property ;
rdfs:comment "An abbreviated name, expressed as text." ;
The BIE description could be used when the BIE is used in one term, and the term is applicable to only one class (as is the case for uncefact:abbreviatedName
: applicable only to uncefact:Qualification
)
1242 (or 1337?) of 1747 props satisfy this condition:
prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
prefix schema: <http://schema.org/>
select * {
?prop uncefact:cefactElementMetadata ?bie
filter not exists {?prop schema:domainIncludes ?dom1,?dom2 filter (?dom1 != ?dom2)}
filter not exists {?prop uncefact:cefactElementMetadata ?bie2 filter (?bie != ?bie2)}
?prop rdfs:comment ?propDescr.
?bie rdfs:comment ?bieDescr.
}
This requires further examination and unification of descriptions. Eg in the pair below:
- prop
Accreditation
: "An official recognition awarded to a person, organisation or thing, such as a building or product, to certify that a certain level of attainment has been achieved." - BIE
Certified_Accreditation.Details
: "A certified recognition that provides evidence of a level of competency in a given area, such as certifying a level of skill in a trade."
- uncefact/spec-jsonld#40
schema.org uses
rdf:Property
because almost all of its props allow literal (text) in addition to object. However, UNCEFACT seems to be strict in following a "property dichotomy", so use owl:ObjectProperty (797, see below) vs owl:DatatypeProperty (950).
range | c | comment |
---|---|---|
xsd:string | 791 | all literals |
xsd:token | 159 | identifiers, all end in Id |
uncefact: | 797 | uncefact classes (object props) |
uncefact/spec-jsonld#42 The props in these pairs are duplicated, so should be merged
bICId
: "The unique Bank Identification Code (BIC) as defined in ISO 9362 for this creditor or debtor financial institution."bICIdentificationId
: "The Bank Identifier Code (BIC) as defined by ISO 9362 (Banking telecommunication messages, Bank Identifier Codes) for this financial identity."
versionId
: "An identifier of the version."versionIdentificationId
: "The unique identifier for the version of this exchanged document."
attachedBinaryFile
: "A binary file attached to this exchanged or referenced document."attachmentBinaryObject
: A binary object that is attached or otherwise appended to this referenced document."
referenceDocument
referencedDocument
: haven't checked description
This is according to principle 4. deduplication of #33 (tech-spec.md)
Also:
- uncefact/spec-jsonld#43
Address: the breakdown
lineOne, lineTwo, lineThree, lineFour
is pretty random (why not 6 or 10?) and old-fashioned. xsd:string allows multiple lines, so merge to just one prop egaddressLines
Document has enough props to describe also DocumentLines, whcih are document parts:
lineCountNumeric
: if it's a Document, count of lines in itlineId
: if it's a DocumentLine, its line IDparentLineId
: to establish a hierarchy between lines
However, there's no way to express a document hierarchy or parthood:
- Which lines comprise this Document
- Which lines are nested under this parent line?
This leads to confusions such as
- Wrong domain? The description talks about "DocumentLine" but the domain is
Document
uncefact:lineStatusReason
rdfs:comment "A reason, expressed as text, for the line status in this document line." ;
schema:domainIncludes uncefact:Document .
uncefact:lineStatusReasonCode
rdfs:comment "The code specifying the line status reason for this document line." ;
schema:domainIncludes uncefact:Document .
- Wrong domain of
lineTotalBasisAmount
? Where is that "line" mentioned in the description?Tax
has 51 attributes includingbasisAmount
so how can one distinguish between the two?- On the other hand its "sibling" prop
lineTotalAmount
has domainMonetarySummation
uncefact:lineTotalBasisAmount
rdfs:comment "A monetary value used as the line total basis on which this trade related tax, levy or duty is calculated." ;
schema:domainIncludes uncefact:Tax ;
Currently UNCEFACT uses only two literal datatypes: xsd:string
(791 props) and xsd:token
(159 props)
UNCEFACT prop names are made according to ISO/IEC 11179 Metadata Registry (MDR), part 5:2015 Naming and identification principles. The last word of prop names (let's call it "kind") suggests many other datatypes.
Surely trade involves some numbers and some dates?!?
I checked that all props with kind Id are xsd:token
(good).
This query counts xsd:string
props by "kind":
- Count of
xsd:string
props by "kind" (last word of name)
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?kind (count(*) as ?c) {
?prop schema:rangeIncludes xsd:string
bind(replace(str(?prop),".*([A-Z][a-z]*)","$1") as ?kind)
filter(regex(?kind,"^[A-Z]"))
} group by ?kind order by ?kind
Count and tentative proposed changes:
kind | c | change to |
---|---|---|
"Access" | 1 | |
"Agency" | 1 | |
"Amount" | 89 | numeric |
"Basis" | 2 | |
"Box" | 1 | |
"Charge" | 1 | |
"Code" | 154 | xsd:token |
"Conditions" | 1 | |
"Criteria" | 1 | |
"Date" | 3 | xsd:date |
"Description" | 21 | |
"Dimension" | 1 | |
"Five" | 1 | |
"Four" | 1 | |
"Indicator" | 73 | xsd:boolean |
"Information" | 21 | |
"Instructions" | 2 | |
"Limit" | 2 | |
"List" | 2 | |
"Means" | 1 | |
"Measure" | 66 | |
"Name" | 47 | |
"Number" | 4 | numeric |
"Numeric" | 15 | IndexNumeric, SequenceNumeric -> xsd:integer |
"Object" | 7 | |
"Of" | 2 | |
"One" | 1 | |
"Pattern" | 1 | |
"Percent" | 16 | numeric |
"Phrase" | 1 | |
"Point" | 1 | |
"Procedure" | 1 | |
"Quantity" | 91 | numeric |
"Rate" | 4 | |
"Reason" | 7 | |
"Reference" | 6 | |
"Remark" | 2 | |
"Remarks" | 1 | |
"Restriction" | 3 | |
"Result" | 1 | |
"Status" | 1 | |
"Three" | 1 | |
"Time" | 79 | xsd:dateTime |
"Title" | 1 | |
"Two" | 1 | |
"Type" | 9 | |
"Use" | 1 | |
"Value" | 1 | |
"Zone" | 1 |
Examples:
- Numeric candidates:
uncefact:usedToDateQuotaQuantity, uncefact:usedSignalSourceQuantity, taxBasisTotalAmount, taxBasisAllowanceRate
date
ordateTime
candidates:uncefact:occurrenceDateTime
xsd:boolean
candidates:uncefact:nilCarriageValueIndicator, uncefact:nilCustomsValueIndicator, uncefact:nilInsuranceValueIndicator
uncefact/spec-jsonld#46 Decide numeric datatypes
uncefact/spec-jsonld#47
Props named xxxCode
come in two kinds:
prefix uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
PREFIX schema: <http://schema.org/>
select ?range (count(*) as ?c) {
?prop schema:rangeIncludes ?range1
filter(regex(str(?prop),"Code$"))
bind(if(strstarts(str(?range1),str(uncefact:)),uncefact:,?range1) as ?range)
} group by ?range
xsd:string
: 154. Consider mapping to rangexsd:token
(same as props namedxxxId
). xsd:oken doesn't allow leading, consecutive and trailing spaces, so it fits better than xsd:string. Example:accessRightsCode xsd:string
->xsd:token
- Objects (codelist values): 110. Consider renaming them to remove
Code
(because objects are not codes!). Examples:accountingDocumentSetTriggerCode uncefact:UNCL1001Code
->accountingDocumentSetTrigger
cross-BorderRegulatoryProcedureTypeCode uncefact:UNCL9353Code
->cross-BorderRegulatoryProcedureType
logisticsSealSealingPartyRoleCode uncefact:UNCL9303Code
->logisticsSealSealingPartyRole
- remove the parasitic word
Identification
fromxxxIdentificationId
. Eguncefact:versionIdentificationId
,bICIdentificationId
(led to duplication), etc - Note: renaming
xxxIndicator
toisXxx
won't work well, egtransportEquipmentSplitGoodsIndicator
uncefact/spec-jsonld#49 Here are the remaining 4:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
?x a rdf:Property
filter(regex(str(?x),"Identification"))
filter(!regex(str(?x),"IdentificationId"))
}
uncefact:allowanceChargeIdentificationTypeCode
"The code specifying the type of this trade allowance charge." rangeUNCL5189Code
"Code specifying the identification of an allowance or charge." rename toallowanceOrChargeType
uncefact:geoCoordinateIdentificationGeographicalCoordinate range GeoCoordinate
: rename togeoCoordinate
uncefact:natureIdentificationCargo
: descr sounds like it's free text "Transport cargo details of the consignment or consignment item sufficient to identify its nature for customs, statistical or transport purposes." but in fact it hasrange uncefact:Cargo
, so rename tocargo
- The descr of class
Cargo
is "Goods being transported." so none of that "sufficient to identify its nature" fuzziness?
- The descr of class
uncefact:uNDGIdentificationCode
: "United Nations Dangerous Goods (UNDG) number": rename toundgCode
Many dateTime props have names called "formatted".
- As opposed to what, cuneiform? :-)
- You should indicate the required format with
rangeIncludes xsd:dateTime
, not with the property name
This query finds them, together with a better-named prop when it exists:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select ?x ?y {
?x a rdf:Property
bind(strafter(str(?x),str(uncefact:)) as ?xName)
filter(regex(?xName,"formatted","i"))
optional {
?y a rdf:Property
bind(strafter(str(?y),str(uncefact:)) as ?yName)
filter(?xName != ?yName && regex(?xName,?yName,"i"))
}
}
I think all x
should be simplified by removing the parasitic words "formatted", and merged with y
when indicated:
x | y | note |
---|---|---|
formattedExpiryDateTime | expiryDate | maybe merge all 3 but see below |
formattedExpiryDateTime | expiryDateTime | |
formattedFormattedCancellationAnnouncedLaunchDateTime | ||
formattedFormattedIssueDateTime | issueDateTime | |
formattedFormattedLatestProductDataChangeDateTime | ||
formattedFormattedPickUpAvailabilityDateTime | formattedPickUpAvailabilityDateTime | merge & rename to pickUpAvailabilityDateTime |
formattedPickUpAvailabilityDateTime | ||
formattedFormattedReceivedDateTime | receivedDateTime | |
formattedFormattedUltimateShipToDeliveryDateTime | ultimateShipToDeliveryDateTime | |
formattedJurisdictionEntryDateTime | ||
formattedLastRegisteredYearDateTime | lastRegisteredYearDateTime | |
formattedObtainedDateTime | ||
formattedScheduledArrivalRelatedDateTime | arrivalRelatedDateTime | |
formattedScheduledArrivalRelatedDateTime | scheduledArrivalRelatedDateTime | |
formattedScheduledDepartureRelatedDateTime | departureRelatedDateTime | |
formattedScheduledDepartureRelatedDateTime | scheduledDepartureRelatedDateTime |
This puppy is really messed up:
formattedExpiryDateTime
"The date, time, date time or other date time value when this certified accreditation expires."- but range is
UNCL2379Code
"Code specifying the representation of a date, time or period." - So should this be
expiryDateTime range xsd:dateTime
, orexpiryDateTimeFormat range UNCL2379Code
???
(Ooook, most of them are in this sort of situation)
expiryDateTime
"A date, time, date time, or other date time value of expiry."expiryDate
"The date of expiry up to which this trade settlement financial card is valid."
These differ only by granularity of datatype. But it's ok to have 2 props since financial card expiry never has a time.
You use consistent camelCasing for props, and UpperCamelCasing for classes (good!). However, it needs to be made smarter when dealing with UPPERCASE:
UPPERCASE abbreviations should be converted to lowercase, then camelCased as a normal word
- otherwise:
- casing is inconsistent depending on whether the abbreviation comes at the start or middle of the property name
- The camelized abbreviation is impossible to recognize in the stream of words
- examples:
- current:
bBANIdentificationId, bICId, australianSNIdentificationId
(wtf isBANI, ICI, SNI
?) - change to:
bbanId, bicId, australianSnId
- or even better:
bban, bic, australianSn
- current:
Haven't looked for class names. Dunno how to catch all cases.
- duplicated word:
formattedFormattedCancellationAnnouncedLaunchDateTime
(and 4 more starting the same way) - duplicated word:
referenceReferenceTypeCode
- duplicated phrase:
documentLineDocumentLineStatusCode
- It would be better to defer to OGC GeoSPARQL clases since OGC is a stronger authority on geographic information than UNCEFACT: uncefact:Circle , uncefact:GeographicalMultiCurve , uncefact:GeographicalPoint , uncefact:GeographicalMultiSurface , uncefact:GeographicalGrid , uncefact:GeographicalMultiPoint , uncefact:Polygon , uncefact:LinearRing , uncefact:GeographicalLine , uncefact:GeographicalSurface
- Much expanded info
"An identification of a set of geographical coordinates for this trade address."
This query finds props named "binary":
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX uncefact: <https://service.unece.org/trade/uncefact/trade/uncefact/vocabulary/uncefact#>
select * {
?x a rdf:Property
bind(strafter(str(?x),str(uncefact:)) as ?xName)
filter(regex(?xName,"binary","i"))
}
There are 15, nearly equally spread between "BinaryObject" and "BinaryFile":
"attachedBinaryFile"
"attachmentBinaryObject"
"creationBinaryFile"
"descriptionBinaryObject"
"imageBinaryObject"
"includedBinaryObject"
"logoAssociatedBinaryFile"
"mapBinaryObject"
"presentationBinaryFile"
"readerBinaryFile"
"referenceFileBinaryObject"
"referencedBinaryFile"
"relatedBinaryFile"
"signatoryImageBinaryObject"
"valueBinaryFile"
Standardize on one of the names; this discrepancy has caused 2 prop duplications (attached, referenced)