Skip to content

Instantly share code, notes, and snippets.

@rehmatworks
Last active May 28, 2023 12:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rehmatworks/12df078aa022d30e5598d29bc28f6732 to your computer and use it in GitHub Desktop.
Save rehmatworks/12df078aa022d30e5598d29bc28f6732 to your computer and use it in GitHub Desktop.
Delete Mongo Duplicates

I needed to create a unique index on a MongoDB collection and I faced the duplicate errors. To delete duplicates, here is what I found to be working. This solution was suggested by ChatGPT 3.5. Please try this at your own risk.

Apologies for the confusion. To delete duplicates while keeping one copy of each document, you can use the following steps:

  1. Connect to your MongoDB database and switch to the "profiles" database.

  2. Identify the duplicate documents based on the "details.company_id" field using the aggregation framework:

    var duplicates = db.companies.aggregate([
      {
        $group: {
          _id: "$details.company_id",
          count: { $sum: 1 },
          duplicateIds: { $push: "$_id" }
        }
      },
      {
        $match: {
          count: { $gt: 1 }
        }
      }
    ]).toArray();

    This query groups the documents by "details.company_id", counts the occurrences, and collects the duplicate "_id" values into the duplicateIds array. The result is stored in the duplicates variable as an array.

  3. Iterate over the duplicates array and delete all duplicates except the first occurrence:

    duplicates.forEach(function(duplicate) {
      var idsToDelete = duplicate.duplicateIds.slice(1);
      db.companies.deleteMany({ _id: { $in: idsToDelete } });
    });

    In this step, we iterate over each duplicate group, extract the "_id" values of the duplicates (excluding the first occurrence with .slice(1)), and use the deleteMany method to delete the duplicates.

  4. After deleting the duplicate documents, you can proceed to create a unique index on the "details.company_id" field to prevent further duplicates:

    db.companies.createIndex({ "details.company_id": 1 }, { unique: true })

    This command creates a unique index on the "details.company_id" field, ensuring that no duplicate values are allowed in the collection.

Please review the results and ensure you have proper backups in place before executing any data deletion operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment