Skip to content

Instantly share code, notes, and snippets.

@FTKhanFT
Created June 19, 2021 06:45
Show Gist options
  • Save FTKhanFT/73893140bbb6844327e1d3d126f5f52d to your computer and use it in GitHub Desktop.
Save FTKhanFT/73893140bbb6844327e1d3d126f5f52d to your computer and use it in GitHub Desktop.
Delete Any duplicate documents in a collection in Mongo DB in less than seconds with aggregation.
var duplicates = [];
db.getCollection('CollectionName').aggregate([ // CollectionName = Your collection name
{ $match: {
name: { "$ne": '' } // discard selection criteria
}},
{ $group: {
_id: { FieldName: "$FieldName"}, // FieldName = Matching field; can be grouped on multiple properties
dups: { "$addToSet": "$_id" }, // _id
count: { "$sum": 1 }
}},
{ $match: {
count: { "$gt": 1 } // Duplicates considered as count greater than one
}}
],
{allowDiskUse: true} // For faster processing if set is larger
) // You can display result until this and check duplicates
.forEach(function(doc) {
doc.dups.shift(); // First element skipped for deleting
doc.dups.forEach( function(dupId){
duplicates.push(dupId); // Getting all duplicate ids
}
)
})
// If you want to Check all "_id" which you are deleting else print statement not needed
printjson(duplicates);
// Remove all duplicates in one go
db.CollectionName.remove({_id:{$in:duplicates}}) // CollectionName = Your collection name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment