Skip to content

Instantly share code, notes, and snippets.

@vhata
Created June 19, 2012 21:46
Show Gist options
  • Save vhata/2956727 to your computer and use it in GitHub Desktop.
Save vhata/2956727 to your computer and use it in GitHub Desktop.
Goodbye MongoDB

In reply to http://www.zopyx.de/blog/goodbye-mongodb

I'm answering the questions as if I'm answering the blogger, no you. :)

First order of business. http://facility9.com/2010/09/five-reasons-to-use-nosql/

MongoDB was deisgned for massive scale data storage and the architecture does it very well. A design decision is not a flaw if you do not like how it behaves in your use case.

the currently memory model of MongoDB based on memory-mapped files is brain-dead. Leaving memory management to the operating is a nice idea

  • in reality it does not scale and does not play very well. There is no single way to control =A0the memory usage using system tools except maintaining mongod instances on dedicated virtual machines without running further services. There are numerous complaints from people about this stupid architectural decision and 10gen is doing nothing to change this brain-dead memory model.

That is the way it is supposed to be. Massive dedicated cluster of boxes for storage. Do not complain when you cannot run other apps on the box as well. Run Crysis at home and keep data on the MongoCluster.

Locking: a global server lock for a scalable database solution is a no-go - especially since MongoDB =A0supports only atomic operations. Now there is relief in the making with more granular locking or the temporary yielding of the lock during long-running write operations.

I agree with this.

Query engine: the query engine of MongoDB still can only use of one index per query. How insane is this? There is no obvious reason why this limitation exists. The index model of MongoDB is very similar to relational databases - in fact: it borrows lots of ideas from relational database. Having worked on indexes and search engines myself for more than a decade I can not recognize any particular reason why the query engine can not use multiple indexes per query - the query engine appears poorly implemented.

Did you read 5 reasons to use no-sql? If you want to use complicated queries instead of map reduce, you are user the wrong database. While I agree that querying on multiple indexes would be awesome, look at what MongoDB was designed for instead.

Query language: using JSON as a query language was a bad decision. The current JSON query language works for standard queries but the functionality of the operators is limited. It is still not possible to express arbitrary queries like in SQL using JSON. One would argue: not needed - but in reality there are always cases where you need more complex queries. The only way around is to implement something client-side or use the server-side JS code execution (single-threaded, slow). Having no option to perform an operation comparable to UPDATE table SET foo=3Dbar WHERE.... (which is possibly a low-hanging fruit). There are various odds and ends with the query language and its implementation. E.g. why don't you get an error message when using the $and operator with MongoDB version that does not support it? Why does MongoDB not complain here about an inappropriate usage of operators? Look at the mailing list and discover such flaws all day long in various postings. Silently discarding errors is a worse thing. If there is a problem then raise the issue and don't hide it under the carpet.

Yes. True. But it all still looks like you are trying to use a relational database. Go get MySql or, if you are like me and like real databases, go get PostgreSQL.

Map-Reduce: Map-reduce in MongoDB feels like a useless appendix added at some point to MongoDB. Same problem as with server-side code execution: it blocks. =A0Now instead of fixing a bad implementation or fixing the underlaying architectural issues, 10gen seems to address the MR limitations by supporting Hadoop for the MR part - either they don't trust their own MR implementation or they won't/can't fix it. No, we do not need more tools for doing map-reduce - there are already too many moving parts in a setup for scalable applications. Either fix MR inside MongoDB or throw it out completely.

Yes. The developers need to make a decision on this. Cut your losses and restart the map reduce.

Sharding: yet another misfeature of MongoDB. Going from a single server installation to a partitioned setups is huge. You need at least two replica sets for the shards, three config servers and the load balancers. That's like building a skyscraper beside a small town-house.

Not an issue. Intended use. You don't start a skyscraper by building a house first, then start complaining that you need to dig out the old foundation because you cannot scale up. MongoDB massive distributed storage with hot fail-over of replica nodes and intelligent shard migration. You don't get that with two boxes.

Data-center awareness: yet another feature that has been tinkered together. Replica sets only support one primary with multiple secondaries. Writes can only go to one primary. Running a replica set across multiple datacenter is doable but writes can only go to one primary in one data-center. Assuming have a replica set with nodes in Europe, US and Asia with the current master being located in US: all writes from US and Asia need to be performed against the master in US and replicated back to the secondaries in Europe and Asia - insane and not scalable.

Are you seriously complaining about a method that will insure data integrity. Write to primary, sync to secondaries seems like a logical way to handle this problem. Now please remember how old MongoDB is and how far they have come. All features cannot go into version 1. https://jira.mongodb.org/browse/SERVER-2545

The "safe" mode is off by default: who made this idiotic decision? Many reports why people about data los have been seen - just for the reason that "safe" is off by default. Although this is documented here and there: does such a decision bring trust to MongoDB? Safe mode must be enabled by default - people should be able to turn it off for performance reasons and with the understanding that turning it off may lead to data loss unless they perform explicit error checking client-side.

Yes. it is a bad default. Just flip the switch to on and the problem goes away. You can even set the minimum number of nodes to write to before returning with a success. A bad default setting doesn't make the app useless.

Journaling: MongoDB pre-allocates 3 GB of data for journaling - independent of the actual database size(s) - insane for small installations.

Again? Why are you running MongoDB on small installations.

Now talking to Jonathan.

It was a very interesting article. I agree with a few things, but look at some of his complaints. Non issues. Foursquare is still raving about MongoDB. Maybe they are using it like the developers intend it to be used.

I still like MongoDB a lot, I've played with Mongo/Django apps, but I still cannot justify using it for one of my apps. It is made for massive data storage and that is just not something I need. Still cool to play with though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment