Skip to content

Instantly share code, notes, and snippets.

@sebastien-collet
Last active December 2, 2019 08:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sebastien-collet/ec79107bbce2f9ac45b2a0932e1f1ee9 to your computer and use it in GitHub Desktop.
Save sebastien-collet/ec79107bbce2f9ac45b2a0932e1f1ee9 to your computer and use it in GitHub Desktop.
import pandas as pd
from pymongo import MongoClient
import os
# ====== Connection ====== #
# Connection to Mongo
client_mongo = MongoClient(os.environ['IP_MONGO'],27017)
# Connection to the database
db = client_mongo.sandbox
# Authenticating to the database
db.authenticate(os.environ['MONGO_USER'],os.environ['MONGO_PASSWORD'])
# Connection to the collection
collection = db.helloworld
# ====== Inserting Documents ====== #
# Creating a simple Pandas DataFrame
liste_hello = ['hello1','hello2']
liste_world = ['world1','world2']
df = pd.DataFrame(data = {'hello' : liste_hello, 'world': liste_world})
# Bulk inserting documents. Each row in the DataFrame will be a document in Mongo
db.insert_many(df.to_dict('records'))
# ====== Finding Documents ====== #
documents = collection.find({'message': 'helloworld1'})
df = pd.DataFrame(list(documents))
@NimzyMaina
Copy link

Tried this out & it's awesome!!!! One correction though

collection.insert_many(df.to_dict('records'))

or

db.helloworld.insert_many(df.to_dict('records'))

or

db.['helloworld'].insert_many(df.to_dict('records'))

Also, if you don't want to insert duplicates. You need to create a unique index on a particular "KEY" How to create unique Index in Mongo DB and insert as follows

Create index in mongo CLI making the hello "Key" to be unique

db.helloworld.ensureIndex( { hello: 1 }, { unique: true, sparse: true } )

Insert ignoring duplicates

# Other imports
from pymongo.errors import BulkWriteError

# code omitted for brevity 
try:
    db.helloworld.insert_many(df.to_dict('records'), ordered=False)
except BulkWriteError as e:
    pass

Duplicates will throw an exception but the new records will be inserted. You can safely ignore the error thrown by duplicate records

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment