Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
import pandas as pd
from pymongo import MongoClient
import os
# ====== Connection ====== #
# Connection to Mongo
client_mongo = MongoClient(os.environ['IP_MONGO'],27017)
# Connection to the database
db = client_mongo.sandbox
# Authenticating to the database
# Connection to the collection
collection = db.helloworld
# ====== Inserting Documents ====== #
# Creating a simple Pandas DataFrame
liste_hello = ['hello1','hello2']
liste_world = ['world1','world2']
df = pd.DataFrame(data = {'hello' : liste_hello, 'world': liste_world})
# Bulk inserting documents. Each row in the DataFrame will be a document in Mongo
# ====== Finding Documents ====== #
documents = collection.find({'message': 'helloworld1'})
df = pd.DataFrame(list(documents))
Copy link

NimzyMaina commented Dec 2, 2019

Tried this out & it's awesome!!!! One correction though






Also, if you don't want to insert duplicates. You need to create a unique index on a particular "KEY" How to create unique Index in Mongo DB and insert as follows

Create index in mongo CLI making the hello "Key" to be unique

db.helloworld.ensureIndex( { hello: 1 }, { unique: true, sparse: true } )

Insert ignoring duplicates

# Other imports
from pymongo.errors import BulkWriteError

# code omitted for brevity 
    db.helloworld.insert_many(df.to_dict('records'), ordered=False)
except BulkWriteError as e:

Duplicates will throw an exception but the new records will be inserted. You can safely ignore the error thrown by duplicate records

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment