Skip to content

Instantly share code, notes, and snippets.

@keithshep
Last active September 22, 2016 16:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save keithshep/0cbb4a9acac233fa55d1 to your computer and use it in GitHub Desktop.
Save keithshep/0cbb4a9acac233fa55d1 to your computer and use it in GitHub Desktop.
mongo cheats
// remove field from all samples
db.getCollection('samples').update({}, {'$unset': {'viterbi_haplotypes': 1}}, {multi: true})
// initialize fields in all samples
db.getCollection('samples').update({}, {'$set': {'viterbi_haplotypes': {}}}, {multi: true})
// how to slice without getting everything you don't care about
db.getCollection('samples').find(
{'_id': ObjectId("55e9cf1a6606af0e6e5cb5a1")},
{
'chromosome_data.6.allele1_fwds': {'$slice': 10},
'chromosome_data.6.allele2_fwds': {'$slice': 10},
// see http://stackoverflow.com/questions/23804254/
// this is a hack that can be used to avoid pulling down all of the other data that
// we don't really care about (the default with slice is to pull down everything else)
'chromosome_data.6.no_read_count': 1
})
// find attributes that are not unique (based on http://blog.mlab.com/2014/03/finding-duplicate-keys-with-the-mongodb-aggregation-framework/)
db.getCollection('samples').aggregate([{
$group: {
_id: {organism_id: '$organism_id'},
count: {$sum: 1}
}
}, {
$match: {
count: {$gt: 1}
}
}])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment