Skip to content

Instantly share code, notes, and snippets.

@RaulMedeiros
Created August 10, 2020 11:48
Show Gist options
  • Save RaulMedeiros/21c8cc92fbfc3d6c52b3c0739a39f5fd to your computer and use it in GitHub Desktop.
Save RaulMedeiros/21c8cc92fbfc3d6c52b3c0739a39f5fd to your computer and use it in GitHub Desktop.
Data Quality
Data Quality
Validity: How closely the data meets defined business rules or constraints. Some common constraints include:
>Mandatory constraints: Certain columns cannot be empty
>Data-type constraints: Values in a column must be of a certain data type
>Range constraints: Minimum and maximum values for numbers or dates
>Foreign-key constraints: A set of values in a column are defined in the column of another table containing unique values
>Unique constraints: A field or fields must be unique in a dataset
Regular expression patterns: Text fields will have to be validated this way.
>Cross-field validation: Certain conditions that utilize multiple fields must hold
>Set-membership constraint: This one is the subcategory of foreign-key constraints. Values for a column come from a set of discrete values or codes.
Accuracy: How closely data conforms to a standard or a true value.
Completeness: How thorough or comprehensive the data and related measures are known
Consistency: The equivalency of measures across systems and subjects
Uniformity: Ensuring that the same units of measure are used in all systems
Traceability: Being able to find (and access) the source of the data
Timeliness: How quickly and recently the data has been updated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment