Skip to content

Instantly share code, notes, and snippets.

@nickwallen
Last active April 28, 2017 15:00
Show Gist options
  • Save nickwallen/a3ab69d96e236bd8a068be10293ae175 to your computer and use it in GitHub Desktop.
Save nickwallen/a3ab69d96e236bd8a068be10293ae175 to your computer and use it in GitHub Desktop.
  1. Setup SSH

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host bash -c "'
        ssh-keygen -t rsa
      '"
    done
    
    for host in y113 y114 y115 y134 y135 y136; do
      cat ~/.ssh/id_rsa.pub | ssh root@$host 'cat >> ~/.ssh/authorized_keys'
    done
    
  2. Turn off firewall

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host bash -c "'
        echo "$host";
        systemctl disable firewalld;
        systemctl stop firewalld;
        systemctl mask firewalld
        systemctl is-active firewalld;
        systemctl is-enabled firewalld;
      '"
    done
    
  3. Disable SElinux

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host bash -c "'
        setenforce 0;
        sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config;
      '"
    done
    
  4. Reboot hosts for SELinux change to take affect

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host bash -c "'
        reboot
      '"
    done
    
  5. Define Ambari repo

    $ cat ambari.repo
    [Updates-ambari-2.4.2.0]
    name=ambari-2.4.2.0 - Updates
    baseurl=http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.2.0
    gpgcheck=1
    gpgkey=http://public-repo-1.hortonworks.com/ambari/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
    enabled=1
    priority=1
    
  6. Push the Ambari repo definition

    for host in y113 y114 y115 y134 y135 y136; do
      scp ambari.repo root@$host:/etc/yum.repos.d/
    done
    
  7. Install Ambari agent

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host "yum -y install ambari-agent"
    done
    
  8. Install Ambari server

    ssh root@y113
    yum -y install ambari-server
    ambari-server setup
    
  9. Reset Ambari agents - server not able to talk to many of the agents

    for host in y113 y114 y115 y134 y135 y136; do
      ssh root@$host bash -c "'
        ambari-agent stop
        ambari-agent reset y113.l42scl.hortonworks.com
        ambari-agent start
      '"
    done
    
  10. Use Wizard to deploy core Hadoop-y. List of FQDN hostnames.

    y113.l42scl.hortonworks.com
    y114.l42scl.hortonworks.com
    y115.l42scl.hortonworks.com
    y134.l42scl.hortonworks.com
    y135.l42scl.hortonworks.com
    y136.l42scl.hortonworks.com
    y137.l42scl.hortonworks.com
    y138.l42scl.hortonworks.com
    
  11. Ran into issues with some of the client installs not working well from Ambari. Had to do the following fixups.

    for host in y113 y114 y115 y134 y135 y136 y137 y138; do
      ssh root@$host bash -c "'
        mkdir -p /etc/spark/conf
        mkdir -p /etc/zeppelin/conf
      '"
    done
    
    ssh root@y137 "rm -rf /metron*/hadoop/hdfs/data"
    
  12. Build Metron.

    nallen@HW12402 : ~/Development/incubator-metron
    $ git log -1
    commit 8fe52c9a126bbfbb82472f48e1f162f128e35694
    Author: nickwallen <nick@nickallen.org>
    Date:   Mon Apr 24 11:36:58 2017 -0400
    METRON-866 Quickdev and Fulldev readme misses Docker as prerequisite  (nickwallen) closes apache/incubator-metron#544
    
    cd incubator-metron
    mvn clean install package -DskipTests
    
  13. Build Metron RPMs.

    cd metron-deployment
    mvn clean package -Pbuild-rpms -DskipTests
    
  14. Push the Metron RPMs to the host on which Metron will be installed. In this case its y136.

    ssh root@y136 bash -c "'
      mkdir -p /localrepo
    '"
    
    scp packaging/docker/rpm-docker/RPMS/noarch/*.rpm root@y136:/localrepo/
    
  15. Build Metron MPack and copy it to the host running Ambari Server (y113).

    cd metron-deployment
    mvn clean package -DskipTests
    scp packaging/ambari/metron-mpack/target/metron_mpack-0.4.0.0.tar.gz root@y113:~/
    
  16. Create local Metron RPM repo. Only the host where Metron will be installed needs RPMs in this repo. All other hosts still need the repository created with these commands, even though the repo will be empty.

    for host in y113 y114 y115 y134 y135 y136 y137 y138; do
      ssh root@$host bash -c "'
        yum  --disablerepo METRON-0.4.0 -y install createrepo
        mkdir -p /localrepo
        createrepo /localrepo
        ls -l /localrepo
      '"
    done
    
  17. Install the Mpack on y113.

    ssh root@y113 bash -c "'
      ambari-server install-mpack --mpack=metron_mpack-0.4.0.0.tar.gz --verbose
      ambari-server restart
    '"
    
  18. With Ambari install new services; Metron, Kibana, Elasticsearch.

    • y113.l42scl.hortonworks.com as Elasticsearch master
    • y113.l42scl.hortonworks.com as Kibana Master
    • y134, y135 as Elasticsearch data nodes
    • KDC admin as admin/admin@METRONEXAMPLE.COM
  19. Elasticsearch failing on start on both master and data nodes.

    Apr 26 16:45:41 y113 systemd: Starting Elasticsearch...
    Apr 26 16:45:41 y113 systemd: Started Elasticsearch.
    Apr 26 16:45:41 y113 elasticsearch: Exception in thread "main" SettingsException[Failed to load settings from [elasticsearch.yml]]; nested: ParserException[while parsing a block mapping
    Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 2, column 1:
    Apr 26 16:45:41 y113 elasticsearch: cluster:
    Apr 26 16:45:41 y113 elasticsearch: ^
    Apr 26 16:45:41 y113 elasticsearch: expected <block end>, but found FlowEntry
    Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 67, column 26:
    Apr 26 16:45:41 y113 elasticsearch: network.host: "_lo:ipv4_","_eth0:ipv4_"
    Apr 26 16:45:41 y113 elasticsearch: ^
    Apr 26 16:45:41 y113 elasticsearch: ];
    Apr 26 16:45:41 y113 elasticsearch: Likely root cause: while parsing a block mapping
    Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 2, column 1:
    Apr 26 16:45:41 y113 elasticsearch: cluster:
    Apr 26 16:45:41 y113 elasticsearch: ^
    Apr 26 16:45:41 y113 elasticsearch: expected <block end>, but found FlowEntry
    Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 67, column 26:
    Apr 26 16:45:41 y113 elasticsearch: network.host: "_lo:ipv4_","_eth0:ipv4_"
    Apr 26 16:45:41 y113 elasticsearch: ^
    Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:570)
    Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
    Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:168)
    Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.YAMLParser.nextToken(YAMLParser.java:342)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:53)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.serializeObject(XContentSettingsLoader.java:99)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:67)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:45)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.YamlSettingsLoader.load(YamlSettingsLoader.java:46)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.Settings$Builder.loadFromStream(Settings.java:1080)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.Settings$Builder.loadFromPath(Settings.java:1067)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:88)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.initialSettings(Bootstrap.java:202)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:241)
    Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
    Apr 26 16:45:41 y113 elasticsearch: Refer to the log for complete error details.
    Apr 26 16:45:41 y113 systemd: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
    Apr 26 16:45:41 y113 systemd: Unit elasticsearch.service entered failed state.
    Apr 26 16:45:41 y113 systemd: elasticsearch.service failed.
    

    Fixed: In Ambari changed setting to add brackets network.host: ["_lo:ipv4_","_bond0.582:ipv4_"]. The help tool tip did not mention the brackets.

  20. Elasticsearch seems to start, but Kibana cannot use it. Elasticsearch shows RED stats.

    GET _cluster/health
    
    {
       "cluster_name": "metron",
       "status": "red",
       "timed_out": false,
       "number_of_nodes": 3,
       "number_of_data_nodes": 2,
       "active_primary_shards": 0,
       "active_shards": 0,
       "relocating_shards": 0,
       "initializing_shards": 0,
       "unassigned_shards": 0,
       "delayed_unassigned_shards": 0,
       "number_of_pending_tasks": 0,
       "number_of_in_flight_fetch": 0,
       "task_max_waiting_in_queue_millis": 0,
       "active_shards_percent_as_number": "NaN"
    }
    
    GET _cluster/nodes/stats
    
    {
       "error": {
          "root_cause": [
             {
                "type": "cluster_block_exception",
                "reason": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
             }
          ],
          "type": "cluster_block_exception",
          "reason": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
       },
       "status": 503
    }
    

    The master node is suggesting a change to /etc/security/limits.conf.

    Apr 26 17:38:58 y113 systemd: Started Elasticsearch.
    Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap                ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory
    Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap                ] This can result in part of the JVM being swapped out.
    Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap                ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
    Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap                ] These can be adjusted by modifying /etc/security/limits.conf, for example:
    Apr 26 17:38:59 y113 elasticsearch: # allow user 'elasticsearch' mlockall
    Apr 26 17:38:59 y113 elasticsearch: elasticsearch soft memlock unlimited
    Apr 26 17:38:59 y113 elasticsearch: elasticsearch hard memlock unlimited
    

    FIXED: Changed gateway_recover_after_data_nodes from 3 to 2, since have 2 nodes.

  21. Elasticsearch cluster status now yellow.

    GET _cluster/health
    {
       "cluster_name": "metron",
       "status": "yellow",
       "timed_out": false,
       "number_of_nodes": 3,
       "number_of_data_nodes": 2,
       "active_primary_shards": 1,
       "active_shards": 2,
       "relocating_shards": 0,
       "initializing_shards": 0,
       "unassigned_shards": 1,
       "delayed_unassigned_shards": 0,
       "number_of_pending_tasks": 0,
       "number_of_in_flight_fetch": 0,
       "task_max_waiting_in_queue_millis": 0,
       "active_shards_percent_as_number": 66.66666666666666
    }
    
    GET _cluster/nodes/stats
    
    {
       "error": {
          "root_cause": [
             {
                "type": "index_not_found_exception",
                "reason": "no such index",
                "resource.type": "index_expression",
                "resource.id": "_cluster",
                "index": "_cluster"
             }
          ],
          "type": "index_not_found_exception",
          "reason": "no such index",
          "resource.type": "index_expression",
          "resource.id": "_cluster",
          "index": "_cluster"
       },
       "status": 404
    }
    

    FIX?: Change index_number_of_replicas from 2 to 1. FIX?: ES was yellow status.  the .kibana index was "unassigned". so i just wiped that. restarted kibana and it recreated the index and now green. Then Kibana > Service Actions > Load Template to restore what was lost in deleting the Kibana index.

  22. Install dependencies.

        ssh root@y113 bash -c "'
          yum -y install krb5-server krb5-libs krb5-workstation
        '"
    
  23. Define the host, node1, as the KDC.

        [root@y113 ~]# cat /etc/krb5.conf
        # Configuration snippets may be placed in this directory as well
        includedir /etc/krb5.conf.d/
    
        [logging]
         default = FILE:/var/log/krb5libs.log
         kdc = FILE:/var/log/krb5kdc.log
         admin_server = FILE:/var/log/kadmind.log
    
        [libdefaults]
         dns_lookup_realm = false
         ticket_lifetime = 24h
         renew_lifetime = 7d
         forwardable = true
         rdns = false
         default_realm = EXAMPLE.COM
         default_ccache_name = KEYRING:persistent:%{uid}
    
        [realms]
        EXAMPLE.COM = {
         kdc = y113.l42scl.hortonworks.com
         admin_server = y113.l42scl.hortonworks.com
        }
    
        [domain_realm]
        # .example.com = EXAMPLE.COM
        # example.com = EXAMPLE.COM
        ```
        ```
    cp -f /etc/krb5.conf /var/lib/ambari-server/resources/scripts
    
  24. Do not copy/paste this full set of commands as the kdb5_util command will not run as expected. Run the commands individually to ensure they all execute. This step takes a moment. It creates the kerberos database.

    kdb5_util create -s
    
    service krb5kdc start
    chkconfig krb5kdc on
    
    service kadmin start
    chkconfig kadmin on
    
  25. Setup the admin principals. You'll kinit as the metron principal when running topologies. Make sure to remember the passwords.

    kadmin.local -q "addprinc admin/admin"
    
  26. Kerberize the cluster using the Ambari wizard.

  27. Error: Unable to start Metron topologies.

     ```
     2772 [main] INFO  o.a.s.s.a.AuthUtils - Got AutoCreds [org.apache.storm.security.auth.kerberos.AutoTGT@748a654a]
     2773 [main] INFO  o.a.s.StormSubmitter - Running org.apache.storm.security.auth.kerberos.AutoTGT@748a654a
     Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The TGT found is not renewable
     at org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:103)
     at org.apache.storm.StormSubmitter.populateCredentials(StormSubmitter.java:94)
     at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:214)
     at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:310)
     at org.apache.storm.flux.Flux.runCli(Flux.java:171)
     at org.apache.storm.flux.Flux.main(Flux.java:98)
     Caused by: java.lang.RuntimeException: The TGT found is not renewable
     at org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:94)
     ... 5 more
     ```
    
     FIX: On CentOS 7 the principals were not renewable by default.
    
    1. Parser topologies fail after starting successfully.
    java.lang.IllegalStateException: Unable to initialize message writer
    	at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:70) ~[stormjar.jar:?]
    	at org.apache.metron.parsers.bolt.WriterBolt.prepare(WriterBolt.java:58) ~[stormjar.jar:?]
    	at org.apache.storm.daemon.executor$fn__6573$fn__6586.invoke(executor.clj:798) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:482) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
    Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka producer
    	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:335) ~[stormjar.jar:?]
    	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:163) ~[stormjar.jar:?]
    	at org.apache.metron.writer.kafka.KafkaWriter.init(KafkaWriter.java:173) ~[stormjar.jar:?]
    	at org.apache.metron.writer.WriterToBulkWriter.init(WriterToBulkWriter.java:44) ~[stormjar.jar:?]
    	at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:68) ~[stormjar.jar:?]
    	... 5 more
    Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.kafka.common.protocol.SecurityProtocol.PLAINTEXTSASL
    	at java.lang.Enum.valueOf(Enum.java:238) ~[?:1.8.0_77]
    	at org.apache.kafka.common.protocol.SecurityProtocol.valueOf(SecurityProtocol.java:28) ~[stormjar.jar:?]
    	at org.apache.kafka.common.protocol.SecurityProtocol.forName(SecurityProtocol.java:89) ~[stormjar.jar:?]
    	at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:79) ~[stormjar.jar:?]
    	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:277) ~[stormjar.jar:?]
    	at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:163) ~[stormjar.jar:?]
    	at org.apache.metron.writer.kafka.KafkaWriter.init(KafkaWriter.java:173) ~[stormjar.jar:?]
    	at org.apache.metron.writer.WriterToBulkWriter.init(WriterToBulkWriter.java:44) ~[stormjar.jar:?]
    	at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:68) ~[stormjar.jar:?]
    	... 5 more
    

    As a side note, the setting for Kafka security.inter.broker.protocol is actually PLAINTEXTSASL in Ambari.

    FIX: Altered /var/lib/ambari-server/resources/common-services/METRON/0.4.0/package/scripts/params/params_linux.py and set kafka_security_protocol = 'SASL_PLAINTEXT'. This fixed the parser and indexing topologies.

    [root@y113 ~]# cat /var/lib/ambari-server/resources/common-services/METRON/0.4.0/package/scripts/params/params_linux.py | grep PLAINTEXT
    #kafka_security_protocol = config['configurations']['kafka-broker'].get('security.inter.broker.protocol', 'PLAINTEXT')
    kafka_security_protocol = 'SASL_PLAINTEXT'
    

    FIX: Updated elasticsearch templates value in Ambari to fix indexing.

    FIX: To fix the enrichment topology, I edited /usr/metron/0.4.0/config/enrichment.properties.

    [root@y136 0.4.0]# cat /usr/metron/0.4.0/config/enrichment.properties | grep PLAINTEXT
    kafka.security.protocol=SASL_PLAINTEXT
    
  28. Bro data is now hitting Elasticsearch, but Kibana is not able to read it.

    Error: [unsupported_operation_exception] custom format isn't supported
        at respond (http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:76378:16)
        at checkRespForFailure (http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:76341:8)
        at http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:74959:8
        at processQueue (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:42360:29)
        at http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:42376:28
        at Scope.$eval (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43604:29)
        at Scope.$digest (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43415:32)
        at Scope.$apply (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43712:25)
        at done (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:38161:48)
        at completeRequest (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:38359:8)
    

    FIX: Use ELASTICSEARCH_TEMPLATE_DELETE then ELASTICSEARCH_TEMPLATE_INSTALL.

  29. Storm worker doesn't seem to start for Indexing topology.

    2017-04-27 19:57:06.845 o.a.s.d.supervisor [INFO] Launching worker with command: '/usr/jdk64/jdk1.8.0_77/bin/java' '-cp' '/usr/hdp/2.5.3.0-37/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-rename-hack-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/asm-5.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.3.0-37/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.3.0-37/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/zookeeper.jar:/usr/hdp/2.5.3.0-37/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-core-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/current/storm-supervisor/conf:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/stormjar.jar:/etc/hbase/conf:/etc/hadoop/conf' '-Xmx64m' '-Dlogfile.name=worker.log' '-Dstorm.home=/usr/hdp/2.5.3.0-37/storm' '-Dworkers.artifacts=/var/log/storm/workers-artifacts' '-Dstorm.id=indexing-5-1493322512' '-Dworker.id=803c9e51-b6f9-4ee5-8265-b1ff2160fddf' '-Dworker.port=6704' '-Dstorm.log.dir=/var/log/storm' '-Dlog4j.configurationFile=/usr/hdp/2.5.3.0-37/storm/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' 'org.apache.storm.LogWriter' '/usr/jdk64/jdk1.8.0_77/bin/java' '-server' '-Xmx768m' '-Djava.security.auth.login.config=/usr/hdp/current/storm-supervisor/conf/storm_jaas.conf' '-javaagent:/usr/hdp/current/storm-client/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8650,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-client/contrib/storm-jmxetric/conf/jmxetric-conf.xml,process=Worker_6704_JVM' '-Djava.security.auth.login.config=/usr/metron/0.4.0/client_jaas.conf' '-Djava.library.path=/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/resources/Linux-amd64:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/resources:/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib' '-Dlogfile.name=worker.log' '-Dstorm.home=/usr/hdp/2.5.3.0-37/storm' '-Dworkers.artifacts=/var/log/storm/workers-artifacts' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/var/log/storm' '-Djava.io.tmpdir=/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf/tmp' '-Dlogging.sensitivity=S3' '-Dlog4j.configurationFile=/usr/hdp/2.5.3.0-37/storm/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.id=indexing-5-1493322512' '-Dworker.id=803c9e51-b6f9-4ee5-8265-b1ff2160fddf' '-Dworker.port=6704' '-cp' '/usr/hdp/2.5.3.0-37/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-rename-hack-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/asm-5.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.3.0-37/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.3.0-37/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/zookeeper.jar:/usr/hdp/2.5.3.0-37/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-core-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/current/storm-supervisor/conf:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/stormjar.jar:/etc/hbase/conf:/etc/hadoop/conf' 'org.apache.storm.daemon.worker' 'indexing-5-1493322512' 'b8bd9356-c463-4182-b629-584429417b96' '6704' '803c9e51-b6f9-4ee5-8265-b1ff2160fddf'
    2017-04-27 19:57:06.848 o.a.s.config [INFO] SET worker-user 803c9e51-b6f9-4ee5-8265-b1ff2160fddf metron
    2017-04-27 19:57:06.848 o.a.s.d.supervisor [INFO] Creating symlinks for worker-id: 803c9e51-b6f9-4ee5-8265-b1ff2160fddf storm-id: indexing-5-1493322512 to its port artifacts directory
    2017-04-27 19:57:06.850 o.a.s.d.supervisor [INFO] Creating symlinks for worker-id: 803c9e51-b6f9-4ee5-8265-b1ff2160fddf storm-id: indexing-5-1493322512 for files(1): ("resources")
    2017-04-27 19:57:06.852 o.a.s.d.supervisor [INFO] Running as user:metron command:("/usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" "metron" "worker" "/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf" "/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf/storm-worker-script.sh")
    2017-04-27 19:57:06.853 o.a.s.util [WARN] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf:Invalid permissions on worker-launcher binary.
    2017-04-27 19:57:06.853 o.a.s.util [WARN] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf:The configured nodemanager group 1009 is different from the group of the executable 0
    2017-04-27 19:57:06.853 o.a.s.d.supervisor [INFO] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf exited with code: 22
    2017-04-27 19:57:06.854 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started
    2017-04-27 19:57:07.354 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started
    2017-04-27 19:57:07.855 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started
    
    nallen@HW12402 : ~
    $ for host in y134 y135 y136 y137; do
    >   ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"
    > done
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -rwxr-xr-x 1 root root 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    
    nallen@HW12402 : ~
    $ ssh root@y136 "chmod 6550 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"
    
    nallen@HW12402 : ~
    $ for host in y134 y135 y136 y137; do   ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root root 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    
    nallen@HW12402 : ~
    $ ssh root@y136 "chgrp hadoop /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"
    
    nallen@HW12402 : ~
    $ for host in y134 y135 y136 y137; do   ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-xr-x--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    
    nallen@HW12402 : ~
    $ ssh root@y136 "chmod u+s /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"
    
    nallen@HW12402 : ~
    $ for host in y134 y135 y136 y137; do   ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-x--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    
    nallen@HW12402 : ~
    $ ssh root@y136 "chmod g+s /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"
    
    nallen@HW12402 : ~
    $ for host in y134 y135 y136 y137; do   ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
    
  30. Indexing topology bombed with out of memory.

    2017-04-27 21:21:17.467 o.a.s.d.worker [ERROR] Error when processing event
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    	at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_77]
    	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) ~[?:1.8.0_77]
    	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[?:1.8.0_77]
    	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[?:1.8.0_77]
    	at org.apache.storm.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.protocol.TBinaryProtocol.writeByte(TBinaryProtocol.java:162) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.protocol.TBinaryProtocol.writeFieldBegin(TBinaryProtocol.java:123) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.BoltStats$BoltStatsStandardScheme.write(BoltStats.java:1060) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.BoltStats$BoltStatsStandardScheme.write(BoltStats.java:869) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.BoltStats.write(BoltStats.java:771) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ExecutorSpecificStats.standardSchemeWriteValue(ExecutorSpecificStats.java:216) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.TUnion.write(TUnion.java:152) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ExecutorStats$ExecutorStatsStandardScheme.write(ExecutorStats.java:798) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ExecutorStats$ExecutorStatsStandardScheme.write(ExecutorStats.java:648) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ExecutorStats.write(ExecutorStats.java:564) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ClusterWorkerHeartbeat$ClusterWorkerHeartbeatStandardScheme.write(ClusterWorkerHeartbeat.java:698) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ClusterWorkerHeartbeat$ClusterWorkerHeartbeatStandardScheme.write(ClusterWorkerHeartbeat.java:615) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.generated.ClusterWorkerHeartbeat.write(ClusterWorkerHeartbeat.java:535) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.thrift.TSerializer.serialize(TSerializer.java:79) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.serialization.GzipThriftSerializationDelegate.serialize(GzipThriftSerializationDelegate.java:40) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.utils.Utils.serialize(Utils.java:146) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.cluster$mk_storm_cluster_state$reify__5922.worker_heartbeat_BANG_(cluster.clj:465) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_77]
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_77]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77]
    	at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?]
    	at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.7.0.jar:?]
    	at org.apache.storm.daemon.worker$do_executor_heartbeats.doInvoke(worker.clj:76) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-1.7.0.jar:?]
    2017-04-27 21:21:22.809 STDIO [ERROR] Halting due to Out Of Memory Error...Thread-16-kafkaSpout-executor[5 5]
    

    FIX: Increased max heap space availabel to Storm workers. In Ambari, worker.childopts = -Xmx2048m....

  31. Large number of unassigned shards in Elasticsearch. Elasticsearch path_data = "/opt/lmm/es_data". This needs to use one of the data mounts on /metron1, /metron2, /metron3, and /metron4.

    FIX: Increased heap_size in Ambari/Elasticsearch settings from 128m to 4096m. I think it has to be quite a bit larger even from there.

    FIX

    for host in y135 y136 y137; do
    	for mnt in /metron1 /metron2 /metron3 /metron4; do
    		 ssh root@$host bash -c "'
    			mkdir -p $mnt/elasticsearch
    			chown -R elasticsearch:elasticsearch $mnt/elasticsearch	      			
        		'"
    	done
    done
    

    Delete all indices;

    DELETE /.kibana
    DELETE /bro*
    DELETE /yaf*
    

    Then change the mount setting in Ambari to... Need brackets.

    path_data = ["/metron1/elasticsearch","/metron2/elasticsearch","/metron3/elasticsearch","/metron4/elasticsearch"]
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment