Skip to content

Instantly share code, notes, and snippets.

@artburkart
Created April 30, 2016 01:55
Show Gist options
  • Save artburkart/b165c70061a90cac041aab8fda56d250 to your computer and use it in GitHub Desktop.
Save artburkart/b165c70061a90cac041aab8fda56d250 to your computer and use it in GitHub Desktop.
Mongo-Connector setup instructions

Mongo-Connector Tutorial

Mongo-Connector

I don't know anything about elasticsearch. All I know is this mongo-connector thing makes life easy. Here are some quick start instructions to follow for setup:

AWS ES

I've set up an Elasticsearch instance using the AWS ES service on the free tier. I chose AWS because it didn't have arbitrary limits on indices and what not.

mLab (DBaaS)

For mongo, I'm using an mLab instance, which is connected to a Heroku app. You will need to create an oplog user to read the mongo oplog. Instructions can be found in the mLab docs. You will add its credentials to your config.json.

AWS EC2

In addition to the two services, I am using a t2.micro instance to run mongo-connector. In order to allow the t2.micro to communicate with the ES instance, I altered the AWS access policy. It looks like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource":
"arn:aws:es:us-east-1:ACOUNT_ID:domain/name-of-es-instance/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": [
            "internal.ip.of.t2micro"
          ]
        }
      }
    }
  ]
}

I wanted to use IAM roles, but it's not really clear to me how they'd be used. I attached a role to my IAM user, but I still couldn't access the instance, so I gave up on that thought.


I'm going to omit the step where I inserted data into the DB. If you don't have data, then this is a sort of pointless exercise anyhow.


Installation

Next, you need to install the mongo-connector software. Instructions can be found in the mongo-connector README.md, but they're pretty short, so I'll include them here too.

# Assuming you use an ubuntu instance, `sudo apt-get install python-pip`
# will install pip for you
pip install mongo-connector

Doc Managers

Mongo-connector has two elasticsearch doc managers available, but AWS ES is only available in version 1.x, so you'll need to install the respective elastic-doc-manager. It's also very straightforward, so I'll include its instructions as well.

pip install elastic-doc-manager

Running as a service

To start the mongo-connector, there's some bash command you can run, but I don't know any of the syntax. Instead, I use the sweet daemonized installation steps.

# You can install git with `sudo apt-get install git`
git clone git@github.com:mongodb-labs/mongo-connector.git

# Edit the config.json to your liking
vim config.json

What is 'your liking'? I've included an example similar to the one I use, but you can learn what all the fields mean by reading the docs on the Configuration Options page of their wiki.

Once your config is good, do the following:

# Installs as daemonized service
python setup.py install_service

# Starts service
service mongo-connector start

If you take a look in the mongo-connector logs, you should see a bunch of information getting spat out. This is a sign that it's syncing! To verify that the mongo collections have been sync'd to the elasticsearch instance, you can take a look at the indices tab for your AWS ES instance. It should have an index that is the same name as your mongo db and a caret that opens up and shows a list of mappings (your mongo collections).

Performing some Queries

If you'd like to run some test queries, I recommend using POSTMan. Worth noting, elasticsearch is a little weird because most advanced queries are performed with a POST instead of a GET. Maybe it's more subjective, than it is weird, but whatever. Here's what a query would look like for a car document that looks like this:

{
    "brand": "Tesla",
    "model": "Model 3",
    "perks": "safe, aerodynamic, electric"
}

name-of-aws-es-instance-and-some-gobbledy-gook.us-east-1.es.amazonaws.com/database_name/cars/_search/?size=15

and the payload would be:

{
  "query": {
    "multi_match": {
      "fields":  [ "brand", "perks"],
      "query":     "aerodinamic",
      "fuzziness": "AUTO"
    }
  }
}

Once you POST to the URL with the above payload, elasticsearch will perform a fuzzy search to get your results. It's pretty cool.

That's all! 😄

{
"__comment__": "Configuration options starting with '__' are disabled",
"__comment__": "To enable them, remove the preceding '__'",
"__comment__": "You need to create a user than can read the oplog",
"mainAddress": "mongodb://db_url_without_username_password:db_port/admin",
"__comment__": "NOTE(arthurb): Follow these steps to re-sync elasticsearch from scratch",
"__comment__": "Delete this oplog.timestamp file",
"oplogFile": "/var/log/mongo-connector/oplog.timestamp",
"__comment__": "set 'noDump' to true",
"__comment__": "and restart the service 'sudo service mongo-connector restart'",
"__comment__": "When it's all sync'd up, turn it off again, set 'noDump' to false, and start it up",
"noDump": false,
"batchSize": -1,
"verbosity": 3,
"continueOnError": false,
"logging": {
"type": "file",
"filename": "/var/log/mongo-connector/mongo-connector.log",
"__format": "%(asctime)s [%(levelname)s] %(name)s:%(lineno)d - %(message)s",
"__rotationWhen": "D",
"__rotationInterval": 1,
"__rotationBackups": 10,
"__type": "syslog",
"__host": "localhost:514"
},
"authentication": {
"adminUsername": "oplog-reader-username",
"password": "oplog-readers-password"
},
"__comment__": "For more information about SSL with MongoDB, please see http://docs.mongodb.org/manual/tutorial/configure-ssl-clients/",
"ssl": {
"sslCertificatePolicy": "required"
},
"__comment__": "The database name, followed by the collection name should go here in namespaces.",
"namespaces": {
"include": [
"database_name.cars",
"database_name.trucks",
"database_name.boats",
"database_name.zebras"
]
},
"docManagers": [
{
"docManager": "elastic_doc_manager",
"targetURL": "https://name-of-aws-es-instance-and-some-gobbledy-gook.us-east-1.es.amazonaws.com",
"__bulkSize": 1000,
"__uniqueKey": "_id",
"__autoCommitInterval": null
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment