Skip to content

Instantly share code, notes, and snippets.

@ldodds
Created December 2, 2013 14:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ldodds/7750495 to your computer and use it in GitHub Desktop.
Save ldodds/7750495 to your computer and use it in GitHub Desktop.
Examples of describing a CSV file using a range of formats supported by different tools, including chkcsv.py, csv-validator, datapackage.json, and schema.ini
{
"name": "CSV Validation Example",
"resources": [
{
"name": "Land Registry Example Data",
"path": "../lr-pp-nov-2013.csv",
"format": "csv",
"mediatype": "text/csv",
"encoding": "UTF-8",
"dialect": {
"delimiter": ",",
"lineterminator": "\r\n",
"quotechar": "\""
},
"schema": {
"fields": [
{
"name": "ID",
"title": "Transaction unique identifier",
"description": "A reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded",
"type": "string"
},
{
"name": "Price",
"title": "Price",
"description": "Sale price stated on the Transfer deed",
"type": "integer"
},
{
"name": "Date of Transfer",
"title": "Date of Transfer",
"description": "Date when the sale was completed, as stated on the Transfer deed",
"type": "datetime",
"format": "YYYY-MM-DD hh:mm"
},
{
"name": "Postcode",
"title": "Postcode",
"type": "string"
},
{
"name": "Property Type",
"title": "D-Detached, S-Semi-Detached, T-Terraced, F-Flats/Maisonettes",
"type": "string"
},
{
"name": "Old/New",
"title": "Old/New",
"description": "Y = a newly built property, N = an established residential building",
"type": "string"
},
{
"name": "Duration",
"title": "Duration",
"description": "Relates to the tenure. F-Freehold, L-Leasehold etc",
"type": "string"
},
{
"name": "PAON",
"title": "Primary Addressable Object Name",
"description": "Primary Addressable Object Name. If there is a sub-building for example the building is divided into flats, see Secondary Addressable Object Name (SAON)",
"type": "string"
},
{
"name": "SAON",
"title": "Secondary Addressable Object Name",
"description": "Secondary Addressable Object Name. If there is a sub-building, for example the building is divided into flats, there will be a SAON",
"type": "string"
},
{
"name": "Street",
"title": "Street",
"type": "string"
},
{
"name": "Locality",
"title": "Locality",
"type": "string"
},
{
"name": "Town/City",
"title": "Town/City",
"type": "string"
},
{
"name": "Local Authority",
"title": "Local Authority",
"type": "string"
},
{
"name": "County",
"title": "County",
"type": "string"
},
{
"name": "Record Status",
"title": "Record Status",
"description": "Indicates additions, changes and deletions to the records",
"type": "string"
}
]
}
}
]
}
ID Price Date of Transfer Postcode Property Type Old/New Duration PAON SAON Street Locality Town/City Local Authority County Record Status
{3B0DA29C-C89A-4FAA-918A-0000074FA0E0} 190000 2013-10-04 00:00 SN14 8LU T N F 148 HIGH STREET MARSHFIELD CHIPPENHAM SOUTH GLOUCESTERSHIRE SOUTH GLOUCESTERSHIRE A
{55743403-C4CB-459D-8B15-000110F3CFCA} 420000 2013-10-04 00:00 RG42 6LN S N F SIDMOUTH COTTAGES 2 BRACKNELL ROAD BROCK HILL BRACKNELL BRACKNELL FOREST BRACKNELL FOREST A
{D13EF4A0-8B61-4886-BADA-0001780BD6BA} 250000 2013-06-28 00:00 SK17 8SN T N F THE MILL MILLERS DALE BUXTON HIGH PEAK DERBYSHIRE A
{3D74FE62-3423-4F13-8FEF-0001D8030578} 179950 2013-08-28 00:00 OX16 9LW S N F 11 WESLEY DRIVE BANBURY CHERWELL OXFORDSHIRE A
{8BA9EA94-0A29-4195-947F-000210F13EF0} 310000 2013-10-18 00:00 BA13 4LA D N F 6 HAWKERIDGE WESTBURY WILTSHIRE WILTSHIRE A
{65C935D7-2F81-4D29-9CA7-0002AA741584} 360000 2013-09-25 00:00 RH4 3DX T N F 7 WESTFIELD GARDENS DORKING MOLE VALLEY SURREY A
{7F475813-7D15-4261-AD23-00031DEF95DF} 167500 2013-10-10 00:00 NR2 2BE T N F 100 CAMBRIDGE STREET NORWICH NORWICH NORFOLK A
{D79BCD49-244F-451D-B57E-0004431BF677} 180000 2013-10-25 00:00 ME2 3TS S N F 12 CADNAM CLOSE ROCHESTER MEDWAY MEDWAY A
{9D7CAEBE-51DB-4817-81ED-0004542E3D87} 142500 2013-08-30 00:00 HD9 1LT S N F 29 DALESIDE AVENUE NEW MILL HOLMFIRTH KIRKLEES WEST YORKSHIRE A
{AA34922F-6466-4284-AF22-00058CBFC762} 94000 2013-10-18 00:00 DE7 9HJ S N F 26 BARCLAY COURT ILKESTON EREWASH DERBYSHIRE A
[ID]
data_required=True
type=string
minlen=38
maxlen=38
pattern=\{[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}\}
[Price]
data_required=True
type=integer
[Date of Transfer]
data_required=True
type=string
pattern=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}
[Postcode]
data_required=True
type=string
pattern=[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}
[Property Type]
data_required=True
type=string
pattern=(D|S|T|F)
[Old/New]
data_required=True
type=string
pattern=(Y|N)
[Duration]
data_required=True
type=string
pattern=(F|L)
[PAON]
data_required=False
[SAON]
data_required=False
[Street]
data_required=False
[Locality]
data_required=False
[Town/City]
data_required=True
[Local Authority]
data_required=True
[County]
data_required=True
[Record Status]
data_required=True
type=string
pattern=(A|C|D)
[lr-pp-nov-2013.csv]
ColNameHeader=True
Format=CSVDelimited
CharacterSet=UTF-8
DateTimeFormat=YYYY-MM-DD hh:mm
Col1="ID" Text Width 38
Col2="Price" Integer
Col3="Date of Transfer" DateTime
Col4="Postcode" Text
Col5="Property Type" Text Width 1
Col6="Old/New" Text Width 1
Col7="Duration" Text Width 1
Col8="PAON" Text
Col9="SAON" Text
Col10="Street" Text
Col11="Locality" Text
Col12="Town/City" Text
Col13="Local Authority" Text
Col14="County" Text
Col15="Record Status" Text Width 1
//See https://github.com/digital-preservation/csv-validator
version 1.0
@totalColumns 15
//below not yet supported?
//@quoted
//@separator ','
ID: unique length(38)
Price: positiveInteger
/*
Following reference has to use column number here as tool has restrictions on column names.
Due to limitations in expressing dates, have to use regex
*/
2: regex("[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}")
Postcode: regex("[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}")
4: regex("(D|S|T|F)")
5: regex("(Y|N)")
Duration: regex("(F|L)")
PAON: @optional
SAON: @optional
Street: @optional
Locality: @optional
11: @optional
12:
County:
14: regex("(A|C|D)")
@kvramana13
Copy link

how to validate the above csv . can you please let me know the commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment