Skip to content

Instantly share code, notes, and snippets.

@jamesporter
Created November 16, 2014 00:42
Show Gist options
  • Save jamesporter/3aee2d9344f0fa49e91c to your computer and use it in GitHub Desktop.
Save jamesporter/3aee2d9344f0fa49e91c to your computer and use it in GitHub Desktop.
Large (2GB) json file, so lots of tools are going to struggle. I usually use pandas (Python) for data things which offers very nice, flexible joins, group operations, applying functions etc. Sadly it couldn't cope with dataset (at least on my machine).
Mongo db seems to cope okay. Surprisingly:
mongoimport -d redlist -c data mammals.geojson
works, despite claiming it doesn’t
c. 36000 items
Can also read in threats with e.g.
mongoimport -d redlist -c threats threats.json
There appear to be some duplicates within both datasets (which I haven't had a chance to remove).
Next steps (or what I was working on):
add the threats info to the mammals geographic info.
Coarse grain the geographic info: a map reduce job to do this would be something like:
//MapReduce (coarse grain/combine mammals info)
db.data.mapReduce(
function(){
var round = function(value, acc){
return Math.round(value/acc) * acc;
}
var centroid = function (coords){
var x = 0, y=0, n= coords.length;
for(index in coords){
var item = coords[index];
x += item[0];
y += item[1];
}
x /= n;
y /= n;
x = round(x, 5);
y = round(y, 5);
return {coords: [x,y]};
}
var c = centroid(this.geometry.coordinates[0]);
emit(c, [this.properties.ID_NO]);
},
function(key, values){
for(var idx=1; idx < values.length; ){
values[0].concat(values[idx]);
}
return values[0];
},
{ query: {"geometry.type": "Polygon"},
out: "coarse"}
)
Making a number of assumptions (such as unique keys... abusing map, that this centroid function provides a good approximation of true centroid (it may not) ...
Also not adding the information we want to do subsequent comparison jobs with.
TBC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment