Created
October 22, 2014 11:19
-
-
Save microhello/fe2603731883ef3b594c to your computer and use it in GitHub Desktop.
DataQuality
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
###Data Quality### | |
1. Definition | |
http://en.wikipedia.org/wiki/Data_quality | |
>Reference: | |
>[1]http://www.slideshare.net/OpenDataSupport/open-data-quality-29248578 | |
>[2]http://www.slideshare.net/dba_alex/data-quality-overview?related=1 | |
[3]http://www-01.ibm.com/software/data/quality/ | |
--- | |
2.Tools | |
- Data Cleaner (~50M) | |
http://sourceforge.net/projects/datacleaner/ | |
http://datacleaner.org/resources/docs/3.7/html_single/ | |
- Talend Open Studio For Data Quality(~500M) | |
http://talend.dreamhosters.com/top/release/V5.5.1/TOS_DQ-r118616-V5.5.1.zip | |
---------- | |
3.Dataset | |
[orderdb](orderdb) | |
[airport.csv](airport.csv) | |
[journal in mongodb](journal) | |
--- | |
4.Scene | |
- High Level view | |
> **Simple Statistic**: Orders | |
> **DateTime Analysis** : Orders | |
> **Yearly,Monthly Distribution**:Orders | |
- Completeness | |
> **Completeness, Null Check**: customers | |
- Exception Values - | |
> **Value Distribution** : customers.country/customers.city | |
- Exception Pattern | |
> **Pattern Finder** :customers.postalcode/customers.phone | |
- Reference Integrity | |
> **Reference Integrity** : products.productcode->orderdetails.productcode | |
--- | |
5.Other Feature | |
- csv file ,mongodb etc. datasource support | |
- custom regex and regex marketplace | |
- javascript transform, custom extension ,extension marketplace | |
- console,http interface | |
--- | |
6.Tips | |
- Out of Memory | |
>java -Xmx2048m -jar Datacleaner.jar | |
- Export to HTML | |
>Maybe big size html file | |
>HTML using online js and css | |
--- | |
7.Talend Open Studio For Data Quality Overview |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment