Skip to content

Instantly share code, notes, and snippets.

@Stiivi
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Stiivi/9354794 to your computer and use it in GitHub Desktop.
Save Stiivi/9354794 to your computer and use it in GitHub Desktop.
Basic Data Audit of a datapackage for EU Budget data using Bubbles
# -*- coding: utf8 -*-
from bubbles import Pipeline
stores = {
"package": {"type": "datapackages", "url": "data" }
}
p = Pipeline(stores=stores)
p.source("package", "eu-budget")
p.basic_audit()
p.pretty_print()
p.run()
+-------------------+------------+-----------+----------+----------------------+----------------------+------------------+-------+-------+--------------+-----------------+
|field |record_count|value_ratio|null_count|null_value_ratio |null_record_ratio |empty_string_count|min_len|max_len|distinct_count|distinct_overflow|
+-------------------+------------+-----------+----------+----------------------+----------------------+------------------+-------+-------+--------------+-----------------+
|article_label | 7056| 1.0| 2|0.0002834467120181406 |0.0002834467120181406 | 0| 0| 326| 100|True |
|projection | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 5| 2|False |
|commitment | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 5| 2|False |
|item_legal_basis | 7056| 1.0| 5240|0.7426303854875284 |0.7426303854875284 | 0| 0| 9806| 100|True |
|chapter_label | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 213| 100|True |
|volume_label | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 51| 10|False |
|title_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 2| 40|False |
|item_description | 7056| 1.0| 3749|0.5313208616780045 |0.5313208616780045 | 0| 0| 8929| 100|True |
|subitem_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 14| 26|False |
|item_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 11| 100|True |
|type | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 14| 3|False |
|article_description| 7056| 1.0| 3082|0.43679138321995464 |0.43679138321995464 | 0| 0| 10483| 100|True |
|time | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 4| 3|False |
|subitem_label | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 91| 24|False |
|subitem_description| 7056| 1.0| 7056|1.0 |1.0 | 0| 0| 0| 1|False |
|item_label | 7056| 1.0| 3|0.00042517006802721087|0.00042517006802721087| 0| 0| 227| 100|True |
|article_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 8| 100|True |
|country | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 14| 29|False |
|flow | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 8| 2|False |
|article_legal_basis| 7056| 1.0| 4188|0.5935374149659864 |0.5935374149659864 | 0| 0| 10436| 100|True |
|amount | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 16| 100|True |
|chapter_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 5| 100|True |
|subitem_legal_basis| 7056| 1.0| 7056|1.0 |1.0 | 0| 0| 0| 1|False |
|budget_year | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 4| 1|False |
|volume_name | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 13| 10|False |
|title_label | 7056| 1.0| 0|0.0 |0.0 | 0| 0| 88| 83|False |
|volume_color | 0| 0| 0|0 |0 | 0| 0| 0| 0|False |
|country_code | 0| 0| 0|0 |0 | 0| 0| 0| 0|False |
|country_color | 0| 0| 0|0 |0 | 0| 0| 0| 0|False |
+-------------------+------------+-----------+----------+----------------------+----------------------+------------------+-------+-------+--------------+-----------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment