Skip to content

Instantly share code, notes, and snippets.

@ptgamr
Created September 5, 2017 12:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ptgamr/2e22ca5beea7b20afdc4125392abbccd to your computer and use it in GitHub Desktop.
Save ptgamr/2e22ca5beea7b20afdc4125392abbccd to your computer and use it in GitHub Desktop.
MONGODB: remove duplicates
var duplicates = [];
db.staff.aggregate([
{ $match: {
id: { "$exists": true } // discard selection criteria
}},
{ $group: {
_id: { id: "$id"}, // can be grouped on multiple properties
dups: { "$addToSet": "$_id" },
count: { "$sum": 1 }
}},
{ $match: {
count: { "$gt": 1 } // Duplicates considered as count greater than one
}}
],
{allowDiskUse: true} // For faster processing if set is larger
) // You can display result until this and check duplicates
.forEach(function(doc) {
doc.dups.shift(); // First element skipped for deleting
doc.dups.forEach( function(dupId){
duplicates.push(dupId); // Getting all duplicate ids
})
})
// If you want to Check all "_id" which you are deleting else print statement not needed
printjson(duplicates);
// Remove all duplicates in one go
db.staff.remove({_id:{$in:duplicates}})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment