Skip to content

Instantly share code, notes, and snippets.

@vidyavnv
Last active January 4, 2016 09:09
Show Gist options
  • Save vidyavnv/8600384 to your computer and use it in GitHub Desktop.
Save vidyavnv/8600384 to your computer and use it in GitHub Desktop.
Remove duplicates from mongodb using map reduce
from pymongo import MongoClient
from bson.code import Code
client = MongoClient()
db = client["dbname"]
col = db["collName"]
map = Code("function(){ if(this.fieldName){emit(this.fieldName,1);}}")
reduce = Code("function(key,values) {"
"return Array.sum(values);"
"}")
res = col.map_reduce(map,reduce,"my_results");
response = []
for doc in res.find():
if(doc['value'] > 1):
count = int(doc['value']) - 1
docs = col.find({"fieldName":doc['_id']},{'_id':1}).limit(count)
for i in docs:
response.append(i['_id'])
col.remove({"_id": {"$in": response}})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment