Skip to content

Instantly share code, notes, and snippets.

@jacobian
Last active December 29, 2021 11:21
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacobian/5000515 to your computer and use it in GitHub Desktop.
Save jacobian/5000515 to your computer and use it in GitHub Desktop.
Benchmarking MongoDB's GridFS vs PostgreSQL's LargeObjects.
import io
import pymongo
import psycopg2
import gridfs
import timeit
# For fairness use the same chunk size - 512k.
CHUNK_SIZE = 1024 * 512
def bench_mongo(fs, size):
fp = io.BytesIO('x'*size)
fp.seek(0)
fs.put(fp, chunk_size=CHUNK_SIZE)
def bench_pg(conn, size):
fp = io.BytesIO('x'*size)
fp.seek(0)
lob = conn.lobject(mode='wb')
while 1:
chunk = fp.read(CHUNK_SIZE)
if not chunk:
break
lob.write(chunk)
conn.commit()
mongo = pymongo.MongoClient()
for name, size, reps in (('1k', 1024, 10000), ('10k', 10240, 10000), ('1M', 1024*1024, 1000), ('10M', 1024*1024*10, 100)):
fs = gridfs.GridFS(mongo.lobs_test_database)
t = timeit.timeit('bench_mongo(fs, %s)' % size, 'from __main__ import bench_mongo, fs', number=reps)
print "mongo: %3s x %5s: %0.2fs (%0.4fs/op)" % (name, reps, t, t/reps)
mongo.drop_database('lobs_test_database')
conn = psycopg2.connect(dbname='template1')
conn.autocommit = True
conn.cursor().execute('CREATE DATABASE lobs_test_database')
conn.close()
conn = psycopg2.connect(dbname='lobs_test_database')
t = timeit.timeit('bench_pg(conn, %s)' % size, 'from __main__ import bench_pg, conn', number=reps)
print "pgsql: %3s x %5s: %0.2fs (%0.4fs/op)" % (name, reps, t, t/reps)
conn.close()
conn = psycopg2.connect(dbname='template1')
conn.autocommit = True
conn.cursor().execute('DROP DATABASE lobs_test_database')
conn.close()
@jacobian
Copy link
Author

My results:

$ python benchmark-storage.py 
mongo:  1k x 10000: 21.73s (0.0022s/op)
pgsql:  1k x 10000: 23.52s (0.0024s/op)
mongo: 10k x 10000: 35.09s (0.0035s/op)
pgsql: 10k x 10000: 17.23s (0.0017s/op)
mongo:  1M x  1000: 49.10s (0.0491s/op)
pgsql:  1M x  1000: 38.71s (0.0387s/op)
mongo: 10M x   100: 91.30s (0.9130s/op)
pgsql: 10M x   100: 55.78s (0.5578s/op)

Looking good for lobjects (though the API's not quite as nice). But I should probably rework this to use ministat and check that the results are significant.

@jacobian
Copy link
Author

Also, ops/s would be a more logical thing to report, dunno why I did secs/op, that's silly.

@jacobian
Copy link
Author

See updated version at https://gist.github.com/5005548, with better stats and results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment