mastry/es-on-centos7.md

## es-on-centos7.md

      
    Raw
  

              es-on-centos7.md
            
          
    #Installing Elasticsearch 2.0 on CentOS 7
##Overview
This gist describes how to install Elasticsearch 2.0 on CentOS 7. Some of this information is straight from the Elasticsearch site, some of it is specific to CentOS 7, and a lot of it is specific to the enviroment I was creating at the time for syslog capture. YMMV.
If you are using a different OS you can probably adapt these instructions, but I recommend going over the official documentation first.
##Assumptions and Background
Each node in the cluster was a clean install of CentOS 7. I used the "Compute Node" installation option with no additional packages.
Before installation all of the servers were configured according to these guidelines:

Initial Server Setup with CentOS 7
Additional Recommended Steps for New CentOS 7 Servers

Additional steps were taken, but they're probably not relevant to your setup.
Each node was a Hyper-V 2012 R2 virtual machine (Gen 2) with 8GB RAM, 4 CPUs, 1 NIC, and 2 fixed-size disks (1 30GB for the OS, and 1 500GB for Elasticsearch data and logs). The Hyper-V virtual switch was connected to a 10GB NIC.
I configured each of the cluster nodes with the instructions below, with only minor differences (the network.host setting in elasticsearch.yml, the local IP address, and the host name).
##Memory Configuration
Elasticsearch handles memory optimization on its own, so we need to reduce the OS tendency to swap.
There are other ways to handle this.
sysctl -w vm.swappiness=0

##Firewall Configuration
Assuming the firewall is enabled, we need open two ports for Elasticsearch.
firewall-cmd --permanent --add-port=9200/tcp
firewall-cmd --permanent --add-port=9300/tcp
firewall-cmd --reload   
You probably need to consider other firewall rules for your environment. I had other means to protect my traffic. If you do not, then I recommend limiting port access to only the hosts that need it (at a minimum).
I'm still evaluating the Jetty plugin which adds SSL to the mix. Nice, but I don't know enough to recommend it yet. I have seen some information that eludes to SSL being a bad idea on an Elasticsearch cluster, however.
An Nginx reverse proxy is another option to consider.
##Install Java
Find the latest version of Java in the repo.
yum search java | grep java-

Version 8 of the Open JDK runtime was the latest I found. You only need the Java runtime. Don't bother installing the development environment.
yum install java-1.8.0-openjdk.x86_64

Now check the Java version just to be sure.
java -version

##Install Elasticsearch
First we need a place to store the Elasticsearch data and log files.
If you plan to use the default location (not recommended), you can skip this.
mkdir /home/elasticsearch
mkdir /home/elasticsearch/data
mkdir /home/elasticsearch/log

The locations above reflect how CentOS configured the disks during the initial installation. The larger disk was mounted as /home. There's nothing magical about these paths as far as Elasticsearch is concerned. Feel free to get creative with your partitioning.
I didn't see anything in the Elasticsearch about the permissions needed for the data and log folders,
but through trial and error I found this is the only thing that seems to work.
chmod 777 /home/elasticsearch
chmod 777 /home/elasticsearch/data
chmod 777 /home/elasticsearch/log

Now we're finally ready to download and install Elasticsearch (get the latest download URL from the Elasticsearch site and change below if needed).
wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.0.0/elasticsearch-2.0.0.rpm
rpm -Uvh ./elasticsearch-2.0.0.rpm

##Configure Elasticsearch
We need to configure Elasticsearch to run as a daemon. Don't start the service yet!
systemctl daemon-reload
systemctl enable elasticsearch.service

Edit /etc/elasticsearch/elasticsearch.yml with your favorite editor.
Remove the # comment character from the beginning of the lines with these settings.
Set the values as indicated, or to whatever makes sense in your environment (see notes below).
cluster.name: mycluster
node.name: node1
path.data: /home/elasticsearch/data
path.logs: /home/elasticsearch/log
network.host: 1.1.1.1
http.port: 9200
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["1.1.1.1", "1.1.1.2"]


cluster.name must be identical on each node. If not, they won't talk to each other.
node.name  I just set this to the host name on each node.
path.data This is the path to where you want Elasticsearch to store data. I used a separate disk.
path.logs This is the path to where you want Elasticsearch to store logs. I used a separate disk.
network.host This is the IP address where you want Elasticsearch to bind.
http.port This is the port that Elasticsearch will listen on. If you change it, make sure you change the firewall rules to match. Note that the official docs describe this as a range of ports, but that's not exactly right.
discovery.zen.ping.multicast.enabled Setting this to falseThis forces the cluster to use unicast for node discovery (recommended for production).
discovery.zen.minimum_master_nodes This needs to be a quorum of your servers: (n/2) + 1, where n = number of nodes in your cluster. If you don't understand why having only two servers is a bad idea, you definitely need to read this .
discovery.zen.ping.unicast.hosts You can probably just put in the IP address of every node in your cluster. If you have more complicated requirements (such as dedicated masters) you should refer to the official documentation.

Now we can start the node.
systemctl start elasticsearch.service

To see if everything is working, open a browser and navigate to http://1.1.1.1:9200/_cluster/health?pretty (replace the address with your a valid one for your cluster).
{
  "cluster_name" : "mycluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

As you add more nodes, you should see the number_of_nodes and number_of_data_nodes values increase. If you have any problems, check the log file:
tail /home/elasticsearch/log/mycluster.log

Your logs may be in a different location (depending on how you configured path.logs), but the file name will match the name of your cluster (cluster_name).