Skip to content

Instantly share code, notes, and snippets.

@jxub
Created September 22, 2017 11:51
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save jxub/f722e0856ed461bf711684b0960c8458 to your computer and use it in GitHub Desktop.
Save jxub/f722e0856ed461bf711684b0960c8458 to your computer and use it in GitHub Desktop.
A simple mongoimport for importing csv files with python and pymongo
import pandas as pd
from pymongo import MongoClient
import json
def mongoimport(csv_path, db_name, coll_name, db_url='localhost', db_port=27000)
""" Imports a csv file at path csv_name to a mongo colection
returns: count of the documants in the new collection
"""
client = MongoClient(db_url, db_port)
db = client[db_name]
coll = db[coll_name]
data = pd.read_csv(csv_path)
payload = json.loads(data.to_json(orient='records'))
coll.remove()
coll.insert(payload)
return coll.count()
@mapleleafnj
Copy link

Perfect ... Just what I needed ... thanks ..

@mh-mazen
Copy link

mh-mazen commented Apr 5, 2020

Amazing thanks a lot

@Roalpifi
Copy link

Roalpifi commented Jun 9, 2020

my whole header became one and all the data became a line
How to define each header with its specific column, I'm new to programming
Example of how it looks {a, b, c, d: "e, f, g, h"}
As it should be {a: "e", b: "f", c: "g", d: "h"}

@HackTestes
Copy link

Awesome, thank you soo much!!!

@jeukengl
Copy link

.remove() seems to be deprecated

@jordi-cluet
Copy link

Both remove() and insert() methods are deprecated in mongosh. Check the Compatibility Changes with Legacy mongo Shell for alternative options.

@yunkgao
Copy link

yunkgao commented Apr 16, 2024

import pandas as pd
from pymongo import MongoClient
import json

def mongoimport(csv_path, db_name, coll_name, db_url='localhost', db_port=27017):
    """ Imports a csv file at path csv_name to a mongo colection
    returns: count of the documants in the new collection
    """
    client = MongoClient(db_url, db_port)
    db = client[db_name]
    coll = db[coll_name]
    data = pd.read_csv(csv_path)
    payload = json.loads(data.to_json(orient='records'))
    if coll_name in db.list_collection_names():
        coll.drop()

    coll.insert_many(payload)
    count = coll.count_documents({})
    
    client.close()
    return count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment