Skip to content

Instantly share code, notes, and snippets.

@leommoore
Last active May 7, 2019 11:50
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save leommoore/c7d7cc7de3b5b4544fb0 to your computer and use it in GitHub Desktop.
Save leommoore/c7d7cc7de3b5b4544fb0 to your computer and use it in GitHub Desktop.
MongoDB 3.2.x Sharding

#MongoDB 3.2.x Sharding Sharding is used when the database is too large to run on a single server and you need to spread the load across multiple servers. The name itself refers to the breaking (sharding) of the data into seperate groups of data which will reside on different servers.

##Configuration Server Start the server on your server (myserver0)

mongod --configsvr --dbpath /data

On myserver1 start the shard giving the configuration server as the --configdb option

mongos --configdb myserver0

On myserver2 start the shard giving the configuration server as the --configdb option

mongos --configdb myserver0

On the configuration sever (myserver0) start the mongo shell

mongo

The prompt should be

mongos>

At this point the mongos knows about the config database but does not know anything about the two other shards (myserver1 and myserver2). You can add in the shard specifying the port if it is not the default in the shell using:

sh.addShard('myserver1:port')
{ "shatdAdded" : "shard000", "ok" : 1}

The the next shard:

sh.addShard('myserver2')
{ "shatdAdded" : "shard001", "ok" : 1}

You can check the shard status using:

sh.status()

Unlike replica sets which by default assumes all databases and collections, sharding does not, so you have to tell mongo which databases and collections to shard. It is quite possible that only certain collections require sharding. First you need to tell which database to shard.

sh.enableSharding('mydatabase')
{ "ok": 1}

Then you need to tell which collections to shard. Notice you need to specify the full name of the collection.

sh.shardCollection(Collection,ShardingKey)
sh.shardCollection('mydatabase.mycollection',{country: 1, phone: 1})
{ "collectionsharded" : "mydatabase.mycollection", "ok": 1}

##Production Configuration In production you should have three config servers in case one goes down. A shard can be a single server but it can also be a replica set. This provides durability. You can add a replica set (ie rs1) as a shard using:

sh.addShard('rs1/myserver:27018')

Port 27018 is the replica sets member port. The rest of the replica set members will be discovered and added as well. You can then proceed and setup the database and the collection

sh.enableSharding('mydatabase')
sh.shardCollection(Collection,ShardKey)

##ShardKeys The ShardKey is an important choice. This is what is used to distrubute the data across the shards. Ideally you want a shard key that divides the data up as equally as possible. Bad shard keys include Timestamps, Created Dates etc as these are constantly increasing and it means that there will be a large number of writes to a single shard as all the dates will be very close together so the work is not distributed at all. A slightly less bad keys include countries, surnames (where similar surnames are common) etc. This will divide them up but if you are dividing by country for example, you USA shard will be very busy but your Irish one will not, simply because of relative population size. In effect they are bunched.

###Hashing of Keys You can use hashing to help seperate bunched documents. Hashing generates a key based on the data and the hash is more uniformly divided. You can use hashing in your shardkey using:

sh.shardCollection('mydatabase.mycollection','fieldname': 'hashed'})

With a hashed key you can onle select one field to be hashed.

##Tags Sometimes you may need to tag the shard so that similar documents are grouped. For example you may want to associate all the data in a geographical area together. For example all the Australian data would reside on a server in the Australian data centre rather than a European based server. Note shards can have more than one tag. To add a europe, australia and america tag:

sh.addShardTag('shard0000','europe')
sh.addShardTag('shard0001','australia')
sh.addShardTag('shard0002','americas')

Next tell mongo what Ranges are associated with Tags:

sh.addTagRange("mydatabase.mycollection",{startRange},{endRange},"Tag")
sh.addTagRange("mydatabase.mycollection",{countryPhonePrefix: "1"},{countryPhonePrefix: "2"}, "americas")
sh.addTagRange("mydatabase.mycollection",{countryPhonePrefix: "44"},{countryPhonePrefix: "45"}, "europe")
sh.addTagRange("mydatabase.mycollection",{countryPhonePrefix: "350"},{countryPhonePrefix: "360"}, "europe")

Note: the upperbound is not included, but it does include the lowerbound. In this case +1 is the USA, +350 Gibralter, +351 Portugal, +352 Luxemburg, +353 Ireland, +354 Iceland, +355 Albania, +356 Malta, +357 Cyprus, +358 Finland and Aland Islands and +359 Bulgaria.

Mongo also has the concept of a MaxKey and MinKey so you can specify an upper or lower bound range using these without having to give a value.

Then we need to start the sharding:

sh.enableSharding('mydatabase')

The shard the collection:

sh.shardCollection('mydatabase.mycollection', {countryPhonePrefix: 1})

##Moving Chunks You can explicitly move a chunk to a specific shard using:

sh.moveChunk('mydatabase.mycollection',{query},shard)
sh.moveChunk('mydatabase.mycollection',{country: "USA"},"shard0002")
@LamourBt
Copy link

LamourBt commented Aug 29, 2016

Great gist by the way. I have few questions to start

1 question: you are using 3 mongodb servers or shard instances (myserver0, myserver1, myserver2), in each of these instances you will configure the path of their data storage and their listening port also you would do replica configuration if needed;

It seems that you are adding the shard into your myserver0 instance, why ? isn't it where we should have used the a new instance as our config server and add all shards there?

Or are you using myserver1 and myserver2 as the shards and myserver0 as the config servers ?

@HaythemJ
Copy link

Great post!

I have a question, how to query data by accessing the shard directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment