Skip to content

Instantly share code, notes, and snippets.

@brijesh-deb
Last active June 15, 2018 08:07
Show Gist options
  • Save brijesh-deb/5c64053840f407ec53c710fb493b2ed4 to your computer and use it in GitHub Desktop.
Save brijesh-deb/5c64053840f407ec53c710fb493b2ed4 to your computer and use it in GitHub Desktop.
#MongoDB #NoSQL
  • MongoDB is a document database.

  • MongoDB is the most relational non-relational database

  • Parallel between Relation and MongoDB

    • Database : Database
    • Table : Collection
    • Row: Document
    • Index : Index
    • Join : Lookup
    • Foreign Key : Reference or Embedded Document
    • Multi-table transaction : Single document transaction
  • Documents are multi-dimensional, they can have hierarchy and each document can be different.

  • Lookup can be used to join 2 collection.

  • Transactions around single document update are guaranteed, but not so for multiple document in same or different collections.

  • Installing MongoDB in windows 7

    • Download msi(mongodb-win32-x86_64-2008plus-ssl-3.6.5-signed.msi) from mongodb.com
    • Run the msi (dont select Compass)
    • By default MongoDB gets installed in C:\Program Files\MongoDB
  • Running MongoDB in windows 7

    • bin/mongod.exe is the executable which starts the MongoDB server
    • bin/mongo is the shell
    • Starting mongodb using config file
      • Add a config file (mongodb.conf) with following entries
        #This is an example config file for MongoDB
        dbpath = C:\Users\<user_name>\mongo\data
        port = 27017
        logpath = C:\Users\<user_name>\mongo\mongo.log
        
      • Go to folder C:\Program Files\MongoDB\Server\3.6\bin and give following command
        mongod -f C:\Users\<user_name>\mongo\conf\mongodb.conf
        
      • Connect to mongodb though shell from C:\Program Files\MongoDB\Server\3.6\bin
        mongo -port 27017
        
  • Mongo shell

    • Is like another application to connect to the MongoDB server
    • Can run mongo command or JavaScript
  • Mongo Shell Commands

    • show databases|dbs: lists all databases
    • db : show database you are in now
    • use [database name]: switches to a specific database
    • show collections: lists all collections in a database
    • db.[collection name].insert({"Hello":"World"}) : insert a document in a collection
    • db.[collection name].findOne({"Hello":"World"}) : find one document in a collection
    • db.[collection name].find({"Hello":"World"}) : find all documents in a collection
    • rs.status() : shows replica status if replica set is in place
    • mongo --eval "db.help()" : this runs the command in the shell and returns to the command prompt without opening shell. Any command can be used in place of db.help
    • ObjectId(): lists a new _id everytime
    • ObjectId().getTimestamp(): lists the time stamp included in Object id
    • db.[collection name].find().pretty(): displays all documents in a collection
    • db.[collection name].insertOne([document in JSON format]) : inserts 1 document in a collection
    • db.[collection name].insertMany([documents in JSON format]) : inserts 1 document in a collection
    • db.[collection name].updateOne(filter, update, options): update 1 document in a collection
      • sample: db.movies.updateOne({title:"The Martian"},{$set:{poster:"sample"}})
    • db.[collection name].deleteOne(filter)
    • db.[collection name].deleteMany(filter)
  • Install Mongo Compass

    • Compass is a GUI schema visualization and query builder tool for MongoDB
    • Download compass exe (mongodb-compass-1.13.1-win32-x64.exe) from mongodb.com
    • Run the exe
  • By default Mongo has a database called "local" which is used for internal purpose.

  • ObjectId is unique and is generated by the client which inserting the document in collection. ObjectId is used in Index.

  • You can also set the objectID, provided the ObjectID is unique.

  • Replica Set

    • A replica set is a group of mongod processes that provide redundancy and high availability.
    • Members: Primary DB, Seconday DB(s) and Arbiter DB
    • Data is stored in Primary and Secondary(s). So in a replicate set of 1 Primary and 2 Secondaries, there will be 3 copies of the same data. Currently max 50 members are supported in a replicate set, so max 50 copies of data.
    • Primary node
      • Only member in the replica set that can receive write operation.
      • MongoDB applies write operation on the primary and then records the operation on the primary's oplog.
      • All members of replica set can accept read operation. But by default, an application directs its read operations to the primary member
      • Replica set can have atmost 1 primary.
    • Secondary(s)
      • A secondary maintains a copy of the primary’s data set.
      • To replicate data, a secondary applies operations from the primary’s oplog to its own data set in an asynchronous process.
      • A replica set can have one or more secondaries
      • Although clients cannot write data to secondaries, clients can read data from secondary members.
      • A secondary can become a primary. If the current primary becomes unavailable, the replica set holds an election to choose which of the secondaries becomes the new primary
    • Arbiter
      • An arbiter does not have a copy of data set and cannot become a primary.
      • Replica sets may have arbiters to add a vote in elections for primary.
      • Arbiters always have exactly 1 election vote, and thus allow replica sets to have an uneven number of voting members without the overhead of an additional member that replicates data.
    • Each node is added to a names replica set initially. All nodes try to contact each other through heart beats. Once all nodes are connected, a election is done for the Primary node. Each node has a vote, node which gets highest no of votes is the Primary node.Once Primary node is elected, the replica set is ready for use.
    • For a successful election, majority votes should be available to select a Primary. For 3 node replicate set, 2 votes should be available; for 5 node replicate set 3 votes shoule be available.
  • Setup Replica Set in local

    • Create 3 separate folders to hold data
      • C:\Users<user_name>\mongo\data
      • C:\Users<user_name>\mongo\data1
      • C:\Users<user_name>\mongo\data2
    • Start 3 demons with separate ports from C:\Program Files\MongoDB\Server\3.6\bin
        mongod --port 27017 --dbpath C:\Users\<user_name>\mongo\data --replSet demo_rs
        mongod --port 27018 --dbpath C:\Users\<user_name>\mongo\data1 --replSet demo_rs
        mongod --port 27019 --dbpath C:\Users\<user_name>\mongo\data2 --replSet demo_rs
      
    • Get into shell of the primary member
      • mongo -port 27017
    • Create config for the replica set and initiate the replica
          var democonfig=
          {
            "_id":"demo_rs",
            "members":[
              {
                "_id":0,
                "host":"localhost:27017",
                "priority":10
              },
              {
                "_id":1,
                "host":"localhost:27018"
              },
              {
                "_id":2,
                "host":"localhost:27019",
                "arbiterOnly":true
              }
            ]
          }
        ## check the config
        democonfig 
        
        #initiate the replica
        rs.initiate(democonfig)  
      
    • Verify if the replication works
      • Once initaited the command prompt will show
        • demo_rs:OTHER>
      • Enter and the prompt will show primary member
        • demo_rs:PRIMARY
      • Save some data in primary member
        • db.foo.save({_id:1,value:'hello'})
        • db.foo.find()
      • Connect to shell of secondary member
        • mongo -port 27018
      • Try to read data
        • db.foo.find()
        • Throws message "not master and slaveOk=false"
        • db.setSlaveOk()
        • db.foo.find()
        • data is displayed now
    • Verify failover
      • Kill the primary member running on port 27017
      • Read still works on secondary member 27018
        • db.foo.find()
      • After sometime the secondary member become primary
        • demo_rs:SECONDARY changes demo_rs:PRIMARY
      • If demon is started at 27017 again, it become primary while 27018 becomes secondary
    • Check status of the replica set
      • From any of the member shell run
        • rs.status()
  • Data Storage

    • Mongo uses Memory Mapped Files, a feature for all modern operating system. A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.
    • The server cannot store all its information in memory, but it would like to think of information as just existing and being available to it at any given moment. So it creates a huge byte array and maps it using memory mapped files. Whenever a portion of the array is needed, the OS takes care of loading it or saving it. Thus memory mapped file feature of OS is used by Mongo.
  • Data Format

    • How does application data (document) that has no schema get saved? For this MongoDB use BSON.
    • BSON
      • Stands for Binary JSON.
      • Binary-encoded serialization of JSON-like documents.
      • Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays.
      • BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
    • Adavntages of BSON
      • Little marshalling needed from BSON elementary data types into C data types. That makes reading and writing very fast.
      • Easy to traverse
      • Easy to drill into a document and find a particular field.
  • A document must have an "_id" field. The data type of this ID field can be numeric, integer, floating point, date or even a complex object.

  • The max size of document in Mongo is limited to 16 MB.

  • A collection is referenced by using [database name].[collection name]

  • CAP theorem vis-a-vis MongoDB

    • Consistency: MongoDB is strongly consistent when write/read happens on the Primary node in replica set; this is the default setting. However, if read is done from Secondary node MongoDB becomes eventually consistent.
    • Availability:MongoDB gets high availability through Replica-Sets. As soon as the primary goes down or gets unavailable else, then the secondaries will determine a new primary to become available again.
    • Partition Tolerance:Through the use of said Replica-Sets MongoDB also achieves the partition tolerance: As long as more than half of the servers of a Replica-Set is connected to each other, a new primary can be chosen. Why? To ensure two separated networks can not both choose a new primary. When not enough secondaries are connected to each other you can still read from them (but consistency is not ensured), but not write.
  • Why's MongoDB said to be CP in CAP

    • When there is a Partition, MongoDB selects Consistency over Availability.
    • What happens on a Partition when Primary goes down? System becomes unavailable while a new primary is selected.Basically, whenever a crisis happens and MongoDB needs to decide what to do, it will choose Consistency over Availability. It will stop accepting writes to the system until it believes that it can safely complete those writes.
  • How to Join Data?

    • The document model makes JOINs redundant in many cases.
    • Using the document model, embedded sub-documents and arrays effectively pre-JOIN data by aggregating related fields within a single data structure. Rows and columns that were traditionally normalized and distributed across separate tables can now be stored together in a single document, eliminating the need to JOIN separate tables when the application has to retrieve complete records.
  • Document model provides performance and scalability advantages

    • An aggregated document can be accessed with a single call to the database, rather than having to JOIN multiple tables to respond to a query. The MongoDB document is physically stored as a single object, requiring only a single read from memory or disk. On the other hand, RDBMS JOINs require multiple reads from multiple physical locations.
    • As documents are self-contained, distributing the database across multiple nodes (a process called sharding) becomes simpler and makes it possible to achieve massive horizontal scalability on commodity hardware. The DBA no longer needs to worry about the performance penalty of executing cross-node JOINs (should they even be possible in the existing RDBMS) to collect data from different tables.
  • Defining Document Schema

    • You should start the schema design process by considering the application’s query requirements. The data should be modeled in a way that takes advantage of the document model’s flexibility
      • The read/write ratio of database operations.
      • The types of queries and updates performed by the database.
      • The life-cycle of the data and growth rate of documents.
  • Modeling Relationship

    • Embedding:
      • Data with a 1:1 or 1:Many relationship (where the “many” objects always appear with, or are viewed in the context of their parent documents) is a natural candidate for embedding the referenced information within the parent document.
      • Embedded relationships in documents refer to storing related documents within a original document. The related data is part of the schema of embedding documents. In effect, the entire data is stored together within a single document, with related data stored as an array or sub-object. Data models based on Embedment are also known as De-Normalised data models.
      • Not all 1:1 or 1:Many relationships should be embedded in a single document. Instead, referencing between documents in different collections should be used when:
        • A document is frequently read, but contains an embedded document that is rarely accessed. Embedding the report only increases the in-memory requirements (the working set) of the collection.
        • One part of a document is frequently updated and constantly growing in size, while the remainder of the document is relatively static.
        • The document size exceeds MongoDB’s current 16MB document limit.
      • Sample 1-to-1 relationship with embedding
       {
        "_id": "john",
        "name": "john xxx",
        "placeOfBirth": 
            {
            "street": "123 xyz Street",
            "city": "xyzcity",
            "state": "xyzState",
            "zip": "12345"
            }
        }
      
      • Sample 1-to-many relationship with embedding
      {
          "_id": "John",
          "name": "John xxx",
          "booksAuthored": [
            {
              "name": "Windows server 2016",
              "publisher": "Self Publishing",
              "year": "2016",
              "price": "30"
            },
            {
              "name": "Ubuntu Linux",
              "publisher": "Self Publishing",
              "year": "2017",
              "price": "40"
            }
          ]
      }
      
    • Referencing
      • Referencing enables data normalization, and can give more flexibility than embedding. But the application will issue follow-up queries to resolve the reference, requiring additional round-trips to the server.
      • References are usually implemented by saving the _id field of one document in the related document as a reference. A second query is then executed by the application to return the referenced data.
      • Data models based on References are also known as Normalised data models.
      • Sample 1-to-1 relationship using Referencing
        Address document contains a reference to the patron document
        {
          _id: "john",
          name: "John xxx"
        }
        {
          patron_id: "john",
          street: "123 Fake Street",
          city: "Faketon",
          state: "MA",
          zip: "12345"
        }
      
      • Sample 1-to-many relation using Referencing
      1-to-many relationship between publisher and books
      {
        _id: "oreilly",
        name: "O'Reilly Media",
        founded: 1980,
        location: "CA"
      }
      {
        _id: 123456789,
        title: "MongoDB: The Definitive Guide",
        author: [ "Kristina Chodorow", "Mike Dirolf" ],
        published_date: ISODate("2010-09-24"),
        pages: 216,
        language: "English",
        publisher_id: "oreilly"
      }
      {
        _id: 234567890,
        title: "50 Tips and Tricks for MongoDB Developer",
        author: "Kristina Chodorow",
        published_date: ISODate("2011-05-06"),
        pages: 68,
        language: "English",
        publisher_id: "oreilly"
      }
      
  • Difference between Save and Insert

    • If we try to save 2 documents with same _id, 2nd one overwrites the 1st one. Insert doesnt allow 2 documents with same _id, will throw Duplicate key error.
  • Operations

    • Update
      • Is atomic within a document. If 2 clients give update command, each will be executed in order so that there is no inconsistency
      • db.[collection].update([query],[update],[options)
      • Operators
        • set : to add a new field
        • unset : to remove a field
        • rename : rename a field
        • push : add an item in an array
        • pull : removes all instance of array item
    • Find
      • db.[collection].find([query],[projection])
      • Projections:By default, MongoDB returns all fields as result of an query. If we dont need all fields, we can limit that by using projections. This reduces network overhead.
        • To include Title field and exclude _id field: db.movies.find({genre:"Action"},{title:1,_id:0})
        • 1 means include, 0 means exclude.
  • Difference between save, insert and update

    • Save
      • If _id exists, save will replace existing field in the document
      • If _id doesn't exist, save performs an insert of a new document
    • Insert
      • if _id exists, throws duplicate key error
      • if _id doesn't exist, inserts a new document
    • Update
      • update modifies an existing document matched with your query params.
      • If there is no such matching document, that's when upsert comes in picture.
        • upsert : false : Nothing happens when no such document exist
        • upsert : true : New doc gets created with contents equal to query params and update params
  • MongoDB Driver

    • MongoDB provides drivers for many languages including Java, Python.
    • The driver sits in between application code and MongoDB replica set
    • Usage
      • converts between Java Object and BSON
      • error handling and recovery
      • Topology management ( in replica set)
      • Connection pooling
      • Supports write concern
  • Write Concern

    • Governs what it is meant by sucessful write in a replica set
    • Various options that can be set
      • Unacknowledged (0) : driver sends the write data to Primary
      • Acknowledged (1) - default; write data is in Primary memory (not yet persisted in disk)
      • Majority: Ensure that write data is stored in disk of primary and majority secondary nodes.
  • To Do

    • Cursors
    • Projections
    • Multi document transaction
    • Replica set write concern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment