Skip to content

Instantly share code, notes, and snippets.

@adulau
Last active December 15, 2015 07:58
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save adulau/5226967 to your computer and use it in GitHub Desktop.
Notes about analyzing data from AIVE

AIVE Waste Analysis

Source

JP Florent gave me an XLS file containing the statistic of the waste collected from 2010, 2011 and 2012. The initial file is not really machine friendly as the spreadsheet is used for human reading. This is another important point with open data. Usually a lot of claimed to be "open data" is in an format that you cannot easily parse and need a lot of conversion to be understandable by a machine.

Conversion

First you need to clean-up the file to export it in a proper CSV format. The data contains the unit (To) in each cells. So you don't really need it as you need a numerical value as input and not a string. Here is an approach to automatically convert the exported "messy CSV" for each active sheet (one per year) with ";" as separator:

sed -e "s/To//g" dechet2012.csv | sed -e "s/\,//g" | sed -e "s/;/,/g" | sed -e "s/\"//g" >clean-dechet2012.csv

The conversion process is the following. First we must get rid of the units (To) in all the cells then we must remove the comma used for the representation of large units then we replace the default separator ";" with a "," and finally we remove the string escaping as we treat all the values as numerical except the first column as this is the name of the commune. We add a first line with the label of the column.

Final data looks like this:

Commune,Verres Commune,Papier-carton (porte), Dechet commune melange, Fraction seche (porte),Matiere organique (porte),encombrants (porte),baches agricoles,Parcs conteneurs
 ARLON,706.84 ,805.82 ,0.00 ,3307.66 ,1207.04 ,379.88 ,21.76 ,10954.69 
 ATTERT,74.95 ,41.64 ,0.00 ,455.86 ,149.48 ,5.86 ,21.30 ,3098.84 
 AUBANGE,328.70 ,121.54 ,168.68 ,1814.24 ,1329.66 ,7.36 ,12.66 ,6303.18 
 BASTOGNE,406.06 ,178.52 ,5.24 ,2369.96 ,592.50 ,101.70 ,104.56 ,8330.01 
 BERTOGNE,37.37 ,39.30 ,0.00 ,220.72 ,108.66 ,3.92 ,49.44 ,1271.37 
 BERTRIX,153.79 ,80.36 ,83.82 ,999.60 ,629.12 ,11.78 ,40.96 ,3573.76 
 BOUILLON,215.52 ,138.24 ,172.26 ,742.04 ,256.00 ,86.70 ,23.26 ,2459.41 
 CHINY,93.37 ,72.78 ,0.00 ,632.16 ,142.04 ,62.10 ,16.98 ,2545.57   

Graphing

If you want to graph the data dynamically on the Internet, you'll probably pick one of the zillion of Javascript library available to visualize the data. Now the question, it's more what do you want to visualize and what the message that you would like to pass when doing information visualization.

With the provincial waste data, you can show the current repartition of waster per commune. You can also compare by commune but you will need the number of citizen per commune in order to have comparable dataset.

For comparing the repartition of waste collection, I wanted to show all communes into something comparable. I based my output on the following example from D3 http://bl.ocks.org/mbostock/3888852.

Demo for waste repartition in the province of Luxembourg in 2012

http://www.foo.be/waste/

JSON conversion

As a lot of Javascript libraries relies on JSON for the input, it might be useful to convert the clean CSV file into the JSON format. This can be done in 3 lines of Python but if you are lazy you can use one of the online tool that is doing it: http://www.cparker15.com/code/utilities/csv-to-json/

Open data references

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment