Skip to content

Instantly share code, notes, and snippets.

@TylerBrock
Forked from squarism/mongodb_sharding.txt
Created May 13, 2012 23:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TylerBrock/2690711 to your computer and use it in GitHub Desktop.
Save TylerBrock/2690711 to your computer and use it in GitHub Desktop.
MongoDB sharded cluster install
OS Setup
- Install VMs (base OS = Ubuntu 11.04)
- 6GB disk is not enough. Probably 20-40GB would be good. Mongo has a lot of compression and cleanup features coming.
- Create user to run MongoDB as
- Get DNS or Hosts file set up so all nodes in the cluster can talk to each other
- Generate ssh-keygen -t rsa
- Copy ssh public keys from each node to node01. Then cat all keys to authorized_keys. Then copy that file to each node. Each node will have ssh key trust now. You will also want to ssh from node01 to 02,03,04; from node02 to 01,03,04 etc to test it out.
- Create an initial architecture:
node01: master (replica set 1)
node02: slave (replica set 1)
node03: master (replica set 2)
node04: slave (replica set 2)
node05: mongos & config server
From here on out, these assumptions will be used
- Install NFS server on node05 for storing central configs and scripts.
- Edit /etc/default/nfs-common (to fix ownership of user 4294967294 problem)
NEED_IDMAPD=yes
- Restart NFS:
/etc/init.d/nfs-kernel-server restart
- /etc/exports:
/mnt/nfs node01(rw,async) node02(rw,async) node03(rw,async) node04(rw,async)
I also have a shared directory which is VMware fusion specific. This allows me to edit scripts using Textmate on my mac. There are also javascript files that I edit faster on my mac rather than trying to SCP template files and using VI. This is very specific to my environment. But eventually all config files end up in /mnt/nfs where they are shared to all nodes.
node05: exportfs -a
root@node01:~# mkdir /mnt/nfs
root@node01:~# cat /etc/fstab
(add at the end)
node05:/mnt/nfs /mnt/nfs nfs nfsvers=3 0 0
(nfsvers=3 to fix ownership of user 4294967294 problem)
root@node01:~# mount /mnt/nfs
(repeat for each node)
root@node05:/mnt/nfs# mkdir software
Download mongodb and untar it here. I used 1.8.2.
On each node (1-5), extract the mongo tar ball into their home directory as that user (to preserve ownership):
chris@node01:~$ tar xvzf /mnt/nfs/software/mongodb-linux-x86_64-1.8.2.tgz
On each node (1-5), add mongo binaries to PATH:
echo "export PATH=$PATH:~/mongodb-linux-x86_64-1.8.2/bin" >> .bash_profile
On each node, create a log dir for mongodb startup logs in mongo user home dir:
$ mkdir logs
On each node we'll create a named mongo node. This is because we could install multiple mongo servers on each physical server:
chris@node01:~$ mkdir -p data/mongo01
chris@node02:~$ mkdir -p data/mongo02
etc
Startup Scripts
Create a scripts directory on NFS:
(first, fix permission issue)
chris@node05:/mnt/nfs$ sudo chown chris .
chris@node05:/mnt/nfs$ mkdir scripts
chris@node05:/mnt/nfs$ cd scripts
Create some scripts to look like this. Note that smallfiles and oplogSize are only used for a small dev environment.
:::::::::::::::
start_node01.sh
::::::::::::::
#!/bin/bash
mongod --rest --fork --replSet rs01 --port 27017 --dbpath ~/data/mongo01 --logpath ~/logs/mongo01.log --journal --smallfiles --oplogSize=64 --noprealloc
::::::::::::::
start_node02.sh
::::::::::::::
#!/bin/bash
mongod --rest --fork --replSet rs01 --port 27017 --dbpath ~/data/mongo02 --logpath ~/logs/mongo02.log --journal --smallfiles --oplogSize=64 --noprealloc
::::::::::::::
start_node03.sh
::::::::::::::
#!/bin/bash
mongod --rest --fork --replSet rs02 --port 27017 --dbpath ~/data/mongo03 --logpath ~/logs/mongo03.log --journal --smallfiles --oplogSize=64 --noprealloc
::::::::::::::
start_node04.sh
::::::::::::::
#!/bin/bash
mongod --rest --fork --replSet rs02 --port 27017 --dbpath ~/data/mongo04 --logpath ~/logs/mongo04.log --journal --smallfiles --oplogSize=64 --noprealloc
Create the config server:
chris@node05:~$ mkdir -p data/config01
chris@node05:~$ mkdir logs
more /mnt/nfs/scripts/start_node05.sh
#!/bin/bash
mongod --rest --fork --configsvr --dbpath ~/data/config01 --logpath ~/logs/config01.log
sleep 5
mongos --fork --configdb localhost:27019 --logpath ~/logs/mongos01.log
Make all scripts executable:
chris@node05:/mnt/nfs/scripts$ chmod u+x *.sh
Start each node:
chris@node01:~$ /mnt/nfs/scripts/start_node01.sh
chris@node02:~$ /mnt/nfs/scripts/start_node02.sh
chris@node03:~$ /mnt/nfs/scripts/start_node03.sh
chris@node04:~$ /mnt/nfs/scripts/start_node04.sh
Disk space could be an issue here. Even after fixing --smallfiles, --noprealloc and --oplogSize, data directory is huge with no data.
chris@node03:~$ du -sh data/mongo03/
3.1G data/mongo03/
Turning off prealloc fixed the data problems but had to blow away the database. 1.9 has compact built-in. DO NOT USE THESE THREE SETTINGS IN PRODUCTION.
Mongo Sharding and Replica Set Config
chris@node05:/mnt/nfs/scripts$ cat rs1_repl.js
// rs1_repl.js
// run with mongo localhost:27017 rs1_repl.js on node01
config = {_id: 'rs01', members: [
{_id: 0, host: 'node01:27017'},
{_id: 1, host: 'node02:27017'}]
}
rs.initiate(config);
chris@node05: cat /mnt/nfs/scripts/rs2_repl.js
// rs2_repl.js
// run with mongo localhost:27017 rs2_repl.js on node03
config = {_id: 'rs02', members: [
{_id: 0, host: 'node03:27017'},
{_id: 1, host: 'node04:27017'}]
}
rs.initiate(config);
Start NODE05 script (mongos and config):
chris@node05:/mnt/nfs/scripts$ mongo node05:27017/admin
> db.runCommand( { addshard : "rs01/node01:27017" });
{ "shardAdded" : "rs01", "ok" : 1 }
> db.runCommand( { addshard : "rs02/node03:27017" });
{ "shardAdded" : "rs02", "ok" : 1 }
> use config;
switched to db config
> db.shards.find();
{ "_id" : "rs01", "host" : "rs01/node01:27017" }
{ "_id" : "rs02", "host" : "rs02/node03:27017" }
Data Time
> show dbs;
admin (empty)
config 0.1875GB
test (empty)
> use meow;
switched to db meow
> db.cats.insert( { name: "Fuzzball" } )
> db.cats.find()
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "name" : "Fuzzball" }
> db.cats.find( {name:"Fuzzball"} )
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "name" : "Fuzzball" }
> db.cats.update( { name:"Fuzzball" }, { color:"Gray" } );
> db.cats.find( {name:"Fuzzball"} )
# data is gone! we will fix this next.
> db.cats.find()
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "color" : "Gray" }
> db.cats.update( {color:"Gray"}, { name:"Fuzzball" } )
> db.cats.find()
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "name" : "Fuzzball" }
> db.cats.update( { name:"Fuzzball" }, { $set:{annoying:false} } )
> db.cats.find()
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "annoying" : false, "name" : "Fuzzball" }
> db.cats.update( { name:"Fuzzball" }, { $set:{color:"Gray"} } )
> db.cats.find()
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "annoying" : false, "color" : "Gray", "name" : "Fuzzball" }
> db.cats.find().size()
1
> db.cats.update( {_id:ObjectId("4e4eb5b475e9164396826bc8")}, { $set: {name:"Mr. Fuzzball"} } );
> db.cats.find();
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "annoying" : false, "color" : "Gray", "name" : "Mr. Fuzzball" }
> db.cats.update( {name: "Mr. Fuzzball"}, { $set: {toys:["Red Mouse", "Gray String"]} } )
> db.cats.find();
{ "_id" : ObjectId("4e4eb5b475e9164396826bc8"), "annoying" : false, "color" : "Gray", "name" : "Mr. Fuzzball", "toys" : [ "Red Mouse", "Gray String" ] }
> db.runCommand( { enablesharding : "meow" } );
> db.runCommand( { shardcollection : "cats", key:_id } );
Sharding Failover
> use admin
switched to db admin
> db.runCommand( { listshards :1 })
{
"shards" : [
{
"_id" : "rs01",
"host" : "rs01/node01:27017"
},
{
"_id" : "rs02",
"host" : "rs02/node03:27017"
}
],
"ok" : 1
}
> db.runCommand( { removeshard : "rs01/node01:27017" });
{
"msg" : "draining started successfully",
"state" : "started",
"shard" : "rs01",
"ok" : 1
}
> db.runCommand( { removeshard : "rs01/node01:27017" });
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(0),
"dbs" : NumberLong(2)
},
"ok" : 1
> db.printShardingStatus();
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "rs02", "host" : "rs02/node03:27017" }
{ "_id" : "rs01", "draining" : true, "host" : "rs01/node01:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "rs01" }
{ "_id" : "meow", "partitioned" : false, "primary" : "rs01" }
> db.runCommand( { moveprimary: "test", to: "rs02"} );
{ "primary " : "rs02:rs02/node03:27017", "ok" : 1 }
> db.runCommand( { moveprimary: "meow", to: "rs02"} );
{ "primary " : "rs02:rs02/node03:27017", "ok" : 1 }
> db.runCommand( { removeshard : "rs01/node01:27017" });
{
"msg" : "removeshard completed successfully",
"state" : "completed",
"shard" : "rs01",
"ok" : 1
}
> db.printShardingStatus();
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "rs02", "host" : "rs02/node03:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "rs02" }
{ "_id" : "meow", "partitioned" : false, "primary" : "rs02" }
>
MISC NOTES
Didn't create arbiters in this config. A good practice is to have an odd number of slaves in your replica sets. So create some arbiters. Official docs have good info. TODO: This section is incomplete:
chris@node02:~/data/mongo02$ mkdir ~/data/arb01
chris@node04:~/data/mongo04$ mkdir ~/data/arb02
Start up scripts with shards:
Add --shardsvr to node01.sh and node03.sh
Make 02 and 04 arbiter servers so that RS failover happens more automatically. (ran out of disk space)
Disk space is an issue. I tweaked the oplogSize and prealllocation switches to get around this. Also expanded out my VMs root partitions (another story). I read 1.9 will have better compacting features, not sure if this would result in more free disk space. In any event, you can use the below to reduce footprint or just create VMs with about 20gb.
start_node02.sh
mongod --rest --fork --replSet rs01 --port 27018 --dbpath ~/data/arb01 --logpath ~/logs/arb01.log --journal --smallfiles --oplogSize=64 --noprealloc
start_node04.sh
mongod --rest --fork --replSet rs02 --port 27018 --dbpath ~/data/arb02 --logpath ~/logs/arb02.log --journal --smallfiles --oplogSize=64 --noprealloc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment