Skip to content

Instantly share code, notes, and snippets.

@dixitm20
Created June 17, 2022 19:30
Show Gist options
  • Save dixitm20/3dda0e8c8be8b3c003e32864fbed6198 to your computer and use it in GitHub Desktop.
Save dixitm20/3dda0e8c8be8b3c003e32864fbed6198 to your computer and use it in GitHub Desktop.
Opensearch local docker for Spark Apps | Setup Instructions

Opensearch local docker for Spark Apps | Setup Instructions

Ref Url

Docker Compose

Use the below steps to run the elastic search container

  • Save the below content into a file named docker-compose.yml into an new empty directory. The ssl security is turned off in the below docker-compose file for the easier and simpler local developement setup purposes. Also I have used the specific docker image tag: 2.0.1 below as the image with latest tag was NOT working with the fix for the issue: version number is incompatible with existing ES clients described below.
version: '3'
services:
  opensearch-node1:
    image: opensearchproject/opensearch:2.0.1
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
      - "DISABLE_INSTALL_DEMO_CONFIG=true" # disables execution of install_demo_configuration.sh bundled with security plugin, which installs demo certificates and security configurations to OpenSearch
      - "DISABLE_SECURITY_PLUGIN=true" # disables security plugin entirely in OpenSearch by setting plugins.security.disabled: true in opensearch.yml
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - opensearch-net
  opensearch-node2:
    image: opensearchproject/opensearch:2.0.1
    container_name: opensearch-node2
    environment:
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_master_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
      - "DISABLE_INSTALL_DEMO_CONFIG=true" # disables execution of install_demo_configuration.sh bundled with security plugin, which installs demo certificates and security configurations to OpenSearch
      - "DISABLE_SECURITY_PLUGIN=true" # disables security plugin entirely in OpenSearch by setting plugins.security.disabled: true in opensearch.yml
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net
  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:2.0.1
    container_name: opensearch-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      - 'OPENSEARCH_HOSTS=["http://opensearch-node1:9200","https://opensearch-node2:9200"]'
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true" # disables security dashboards plugin in OpenSearch Dashboards
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
  • Run the below command to run these containers
$ docker-compose up -d
  • You can now access the opensearch dashboard and dev UI console using url: http://localhost:5601/
  • You can also access the elastic search using the curl commands as follows:
$ curl -XGET "http://localhost:9200/_cat/aliases?v"
$ curl -XGET "http://localhost:9200/_cat/indices?v"

Issues

Issue Details: OpenSearch Dashboards server is not ready yet

This is mainly caused due to memory issues while trying to run the opensearch containers while the opensearch-dashboard container is running fine. In such scenario if we try to list the containers using the below command then we would see that only the opensearch-dashboard container is running while the opensearch continers are in status: Exited.

$ docker ps -a
CONTAINER ID   IMAGE                                            COMMAND                  CREATED          STATUS                      PORTS                                                                                  NAMES
68b4737e3b38   opensearchproject/opensearch:latest              "./opensearch-docker…"   11 minutes ago   Exited (0) 10 minutes ago                                                                                          opensearch-node1
b6232cdcba22   opensearchproject/opensearch-dashboards:latest   "./opensearch-dashbo…"   11 minutes ago   Up 11 minutes               0.0.0.0:5601->5601/tcp, :::5601->5601/tcp                                              opensearch-dashboards
556e09085d10   opensearchproject/opensearch:latest              "./opensearch-docker…"   11 minutes ago   Exited (0) 10 minutes ago                                                                                          opensearch-node2

Check docker logs for openserach container using the below command and you would see ERROR as shown below

$ docker logs opensearch-node1
Enabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
.
.
.
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
ERROR: OpenSearch did not exit normally - check the logs at /usr/share/opensearch/logs/opensearch-cluster.log
.
.
.

Some of the reference urls about this issue and possible fixes are as follows

Fix: OpenSearch Dashboards server is not ready yet

Apply below config changes to fix this issue

  • Set docker memory >= 4GB
  • And docker vm.max_map_count = 262144

vm.max_map_count can be set using the below commands in unix env

$ sudo vi /etc/sysctl.conf
# make entry vm.max_map_count=262144

# You can also use the following command in Linux env for the setting same: sysctl -w vm.max_map_count=262144

# restart machine or execute below command to load sysctl changes
$ sudo sysctl --system

Issue Details: version number is incompatible with existing ES clients

Since the OpenSearch uses its own version numbers you may face issues working with OpenSearch while working with ElasticSearch clients as those clients include checks to verify the ElasticSearch version. Even though the OpenSearch project is forked from ElasticSearch but it does not shows ElasticSearch version number by default. One issue that I faced while working with OpenSearch 2.0.1 (which uses ElasticSearch 7.10.2 under the hood) with the Spark application is as shown below. As you can see in the below error that even though the ElasticSearch version under the hood is 7.10.2 it is throwing an error saying that the version is below 6. This happens because the OpenSearch is by default returning the version 2.0.1.

.
.
.
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No type found; Types are required when writing in ES versions 6 and below. Expected [index]/[type], but got [current_index]
	at org.elasticsearch.hadoop.rest.Resource.<init>(Resource.java:110)
	at org.elasticsearch.hadoop.rest.RestRepository$Resources.getResourceWrite(RestRepository.java:107)
	at org.elasticsearch.hadoop.rest.RestRepository.<init>(RestRepository.java:127)
	at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:373)
	at org.elasticsearch.spark.rdd.EsSpark$.doSaveToEs(EsSpark.scala:106)
	at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:79)
	at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:76)
	at org.elasticsearch.spark.rdd.EsSpark$.saveJsonToEs(EsSpark.scala:114)
	at org.elasticsearch.spark.package$SparkJsonRDDFunctions.saveJsonToEs(package.scala:65)
.
.
.

Some of the reference urls about this issue and possible fixes are as follows

Fix: version number is incompatible with existing ES clients

Run the below command to override main response version

# check version number before enabling override
$ curl -XGET "http://localhost:9200/"
# {
#   "name" : "opensearch-node1",
#   "cluster_name" : "opensearch-cluster",
#   "cluster_uuid" : "UgZ0mNJcSEmYZ-phKVoqIw",
#   "version" : {
#     "distribution" : "opensearch",
#     "number" : "2.0.1",
#     "build_type" : "tar",
#     "build_hash" : "6462a546240f6d7a158519499729bce12dc1058b",
#     "build_date" : "2022-06-15T08:47:42.243126494Z",
#     "build_snapshot" : false,
#     "lucene_version" : "9.1.0",
#     "minimum_wire_compatibility_version" : "7.10.0",
#     "minimum_index_compatibility_version" : "7.0.0"
#   },
#   "tagline" : "The OpenSearch Project: https://opensearch.org/"
# }


$ curl -XPUT "http://localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "compatibility.override_main_response_version" : true
  }
}'

# {"acknowledged":true,"persistent":{"compatibility":{"override_main_response_version":"true"}},"transient":{}}

# check version number after enabling override
$ curl -XGET "http://localhost:9200/"
# {
#   "name" : "opensearch-node1",
#   "cluster_name" : "opensearch-cluster",
#   "cluster_uuid" : "UgZ0mNJcSEmYZ-phKVoqIw",
#   "version" : {
#     "number" : "7.10.2",
#     "build_type" : "tar",
#     "build_hash" : "6462a546240f6d7a158519499729bce12dc1058b",
#     "build_date" : "2022-06-15T08:47:42.243126494Z",
#     "build_snapshot" : false,
#     "lucene_version" : "9.1.0",
#     "minimum_wire_compatibility_version" : "7.10.0",
#     "minimum_index_compatibility_version" : "7.0.0"
#   },
#   "tagline" : "The OpenSearch Project: https://opensearch.org/"
# }

Access Opensearch Dashboard & Dev UI Console

In case you are using the default configs with ssl enabled then you would need to use the below user name and passwords. If using the compose file defined above then no username and password is needed.

http://localhost:5601/
Username: admin
Password: admin

Create Index using mapping file

For creating index and setting up the alias run the below commands in terminal.

  • Copy below mapping into a file named sample-mappings.json in path: "~/opensearch/sample-mappings.json". We will be using the same file for creation of a sample index below.
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "emp_id": {
        "type": "long"
      },
      "dept_id": {
        "type": "long"
      },
      "salary": {
        "type": "text"
      }
    }
  }
}
  • Below commands are to be used in case you want to use with ssl NOT enabled.
$ mapping_file_path="[Absolute path to the mapping file]"
# e.g. mapping_file_path="~/opensearch/sample-mappings.json" 

$ index_name="[name of the index]"
$ index_alias_name="[name of the index alias]"
# e.g. index_name="sample_index"
# e.g. index_alias_name="sample_index_alias"

$ curl -X PUT "http://localhost:9200/${index_name}?pretty" -H 'Content-Type: application/json' -d @"${mapping_file_path}"

# verify created index schema
$ curl -XGET "http://localhost:9200/${index_name}/_mapping?pretty"
# {
#   "sample_index" : {
#     "mappings" : {
#       "dynamic" : "strict",
#       "properties" : {
#         "dept_id" : {
#           "type" : "long"
#         },
#         "emp_id" : {
#           "type" : "long"
#         },
#         "salary" : {
#           "type" : "keyword"
#         }
#       }
#     }
#   }

# List all indices
$ curl -XGET "http://localhost:9200/_cat/indices?v"

# Create alias
$ curl -X PUT "http://localhost:9200/${index_name}/_alias/${index_alias_name}?pretty"

# verify alias creation
$ curl -XGET "http://localhost:9200/_cat/aliases?v"

# Delete an index
$ curl -XDELETE "http://localhost:9200/${index_name}"

# Get record counts
$ curl -XGET "http://localhost:9200/${index_name}/_count"
  • In case you want to use with ssl enabled then the below commands need to be used.
$ mapping_file_path="[Absolute path to the mapping file]"
# e.g. mapping_file_path="~/opensearch/sample-mappings.json" 

$ index_name="[name of the index]"
$ index_alias_name="[name of the index alias]"
# e.g. index_name="sample_index"
# e.g. index_alias_name="sample_index_alias"

$ curl -X PUT "https://localhost:9200/${index_name}?pretty" -H 'Content-Type: application/json' -d @"${mapping_file_path}" -u admin:admin --insecure

# verify created index schema
$ curl -XGET "https://localhost:9200/${index_name}/_mapping?pretty" -u admin:admin --insecure
# {
#   "sample_index" : {
#     "mappings" : {
#       "dynamic" : "strict",
#       "properties" : {
#         "dept_id" : {
#           "type" : "long"
#         },
#         "emp_id" : {
#           "type" : "long"
#         },
#         "salary" : {
#           "type" : "keyword"
#         }
#       }
#     }
#   }

# List all indices
$ curl -XGET "https://localhost:9200/_cat/indices?v" -u admin:admin --insecure

# Create alias
$ curl -X PUT "https://localhost:9200/${index_name}/_alias/${index_alias_name}?pretty" -u admin:admin --insecure

# verify alias creation
$ curl -XGET "https://localhost:9200/_cat/aliases?v" -u admin:admin --insecure

# Delete an index
$ curl -XDELETE "https://localhost:9200/${index_name}" -u admin:admin --insecure

# Get record counts
$ curl -XGET "https://localhost:9200/${index_name}/_count" -u admin:admin --insecure

Note: You can open and use the dev tools console from url: http://localhost:5601/app/dev_tools#/console

Spark App For Loading In OpenSearch Index

Below is the sample spark scala app code for loading data into the sample_index created above

# Paste code here

Ref Urls for Spark to Elastic search loading code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment