-
Setup SSH
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host bash -c "' ssh-keygen -t rsa '" done for host in y113 y114 y115 y134 y135 y136; do cat ~/.ssh/id_rsa.pub | ssh root@$host 'cat >> ~/.ssh/authorized_keys' done
-
Turn off firewall
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host bash -c "' echo "$host"; systemctl disable firewalld; systemctl stop firewalld; systemctl mask firewalld systemctl is-active firewalld; systemctl is-enabled firewalld; '" done
-
Disable SElinux
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host bash -c "' setenforce 0; sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config; '" done
-
Reboot hosts for SELinux change to take affect
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host bash -c "' reboot '" done
-
Define Ambari repo
$ cat ambari.repo [Updates-ambari-2.4.2.0] name=ambari-2.4.2.0 - Updates baseurl=http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.4.2.0 gpgcheck=1 gpgkey=http://public-repo-1.hortonworks.com/ambari/centos7/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins enabled=1 priority=1
-
Push the Ambari repo definition
for host in y113 y114 y115 y134 y135 y136; do scp ambari.repo root@$host:/etc/yum.repos.d/ done
-
Install Ambari agent
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host "yum -y install ambari-agent" done
-
Install Ambari server
ssh root@y113 yum -y install ambari-server ambari-server setup
-
Reset Ambari agents - server not able to talk to many of the agents
for host in y113 y114 y115 y134 y135 y136; do ssh root@$host bash -c "' ambari-agent stop ambari-agent reset y113.l42scl.hortonworks.com ambari-agent start '" done
-
Use Wizard to deploy core Hadoop-y. List of FQDN hostnames.
y113.l42scl.hortonworks.com y114.l42scl.hortonworks.com y115.l42scl.hortonworks.com y134.l42scl.hortonworks.com y135.l42scl.hortonworks.com y136.l42scl.hortonworks.com y137.l42scl.hortonworks.com y138.l42scl.hortonworks.com
-
Ran into issues with some of the client installs not working well from Ambari. Had to do the following fixups.
for host in y113 y114 y115 y134 y135 y136 y137 y138; do ssh root@$host bash -c "' mkdir -p /etc/spark/conf mkdir -p /etc/zeppelin/conf '" done ssh root@y137 "rm -rf /metron*/hadoop/hdfs/data"
-
Build Metron.
nallen@HW12402 : ~/Development/incubator-metron $ git log -1 commit 8fe52c9a126bbfbb82472f48e1f162f128e35694 Author: nickwallen <nick@nickallen.org> Date: Mon Apr 24 11:36:58 2017 -0400 METRON-866 Quickdev and Fulldev readme misses Docker as prerequisite (nickwallen) closes apache/incubator-metron#544
cd incubator-metron mvn clean install package -DskipTests
-
Build Metron RPMs.
cd metron-deployment mvn clean package -Pbuild-rpms -DskipTests
-
Push the Metron RPMs to the host on which Metron will be installed. In this case its
y136
.ssh root@y136 bash -c "' mkdir -p /localrepo '" scp packaging/docker/rpm-docker/RPMS/noarch/*.rpm root@y136:/localrepo/
-
Build Metron MPack and copy it to the host running Ambari Server (y113).
cd metron-deployment mvn clean package -DskipTests scp packaging/ambari/metron-mpack/target/metron_mpack-0.4.0.0.tar.gz root@y113:~/
-
Create local Metron RPM repo. Only the host where Metron will be installed needs RPMs in this repo. All other hosts still need the repository created with these commands, even though the repo will be empty.
for host in y113 y114 y115 y134 y135 y136 y137 y138; do ssh root@$host bash -c "' yum --disablerepo METRON-0.4.0 -y install createrepo mkdir -p /localrepo createrepo /localrepo ls -l /localrepo '" done
-
Install the Mpack on
y113
.ssh root@y113 bash -c "' ambari-server install-mpack --mpack=metron_mpack-0.4.0.0.tar.gz --verbose ambari-server restart '"
-
With Ambari install new services; Metron, Kibana, Elasticsearch.
- y113.l42scl.hortonworks.com as Elasticsearch master
- y113.l42scl.hortonworks.com as Kibana Master
- y134, y135 as Elasticsearch data nodes
- KDC admin as admin/admin@METRONEXAMPLE.COM
-
Elasticsearch failing on start on both master and data nodes.
Apr 26 16:45:41 y113 systemd: Starting Elasticsearch... Apr 26 16:45:41 y113 systemd: Started Elasticsearch. Apr 26 16:45:41 y113 elasticsearch: Exception in thread "main" SettingsException[Failed to load settings from [elasticsearch.yml]]; nested: ParserException[while parsing a block mapping Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 2, column 1: Apr 26 16:45:41 y113 elasticsearch: cluster: Apr 26 16:45:41 y113 elasticsearch: ^ Apr 26 16:45:41 y113 elasticsearch: expected <block end>, but found FlowEntry Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 67, column 26: Apr 26 16:45:41 y113 elasticsearch: network.host: "_lo:ipv4_","_eth0:ipv4_" Apr 26 16:45:41 y113 elasticsearch: ^ Apr 26 16:45:41 y113 elasticsearch: ]; Apr 26 16:45:41 y113 elasticsearch: Likely root cause: while parsing a block mapping Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 2, column 1: Apr 26 16:45:41 y113 elasticsearch: cluster: Apr 26 16:45:41 y113 elasticsearch: ^ Apr 26 16:45:41 y113 elasticsearch: expected <block end>, but found FlowEntry Apr 26 16:45:41 y113 elasticsearch: in 'reader', line 67, column 26: Apr 26 16:45:41 y113 elasticsearch: network.host: "_lo:ipv4_","_eth0:ipv4_" Apr 26 16:45:41 y113 elasticsearch: ^ Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:570) Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158) Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:168) Apr 26 16:45:41 y113 elasticsearch: at com.fasterxml.jackson.dataformat.yaml.YAMLParser.nextToken(YAMLParser.java:342) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:53) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.serializeObject(XContentSettingsLoader.java:99) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:67) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.XContentSettingsLoader.load(XContentSettingsLoader.java:45) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.loader.YamlSettingsLoader.load(YamlSettingsLoader.java:46) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.Settings$Builder.loadFromStream(Settings.java:1080) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.common.settings.Settings$Builder.loadFromPath(Settings.java:1067) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:88) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.initialSettings(Bootstrap.java:202) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:241) Apr 26 16:45:41 y113 elasticsearch: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35) Apr 26 16:45:41 y113 elasticsearch: Refer to the log for complete error details. Apr 26 16:45:41 y113 systemd: elasticsearch.service: main process exited, code=exited, status=1/FAILURE Apr 26 16:45:41 y113 systemd: Unit elasticsearch.service entered failed state. Apr 26 16:45:41 y113 systemd: elasticsearch.service failed.
Fixed: In Ambari changed setting to add brackets
network.host: ["_lo:ipv4_","_bond0.582:ipv4_"]
. The help tool tip did not mention the brackets. -
Elasticsearch seems to start, but Kibana cannot use it. Elasticsearch shows RED stats.
GET _cluster/health { "cluster_name": "metron", "status": "red", "timed_out": false, "number_of_nodes": 3, "number_of_data_nodes": 2, "active_primary_shards": 0, "active_shards": 0, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 0, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": "NaN" }
GET _cluster/nodes/stats { "error": { "root_cause": [ { "type": "cluster_block_exception", "reason": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];" } ], "type": "cluster_block_exception", "reason": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];" }, "status": 503 }
The master node is suggesting a change to
/etc/security/limits.conf
.Apr 26 17:38:58 y113 systemd: Started Elasticsearch. Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap ] This can result in part of the JVM being swapped out. Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536 Apr 26 17:38:59 y113 elasticsearch: [2017-04-26 17:38:59,062][WARN ][bootstrap ] These can be adjusted by modifying /etc/security/limits.conf, for example: Apr 26 17:38:59 y113 elasticsearch: # allow user 'elasticsearch' mlockall Apr 26 17:38:59 y113 elasticsearch: elasticsearch soft memlock unlimited Apr 26 17:38:59 y113 elasticsearch: elasticsearch hard memlock unlimited
FIXED: Changed
gateway_recover_after_data_nodes
from 3 to 2, since have 2 nodes. -
Elasticsearch cluster status now yellow.
GET _cluster/health { "cluster_name": "metron", "status": "yellow", "timed_out": false, "number_of_nodes": 3, "number_of_data_nodes": 2, "active_primary_shards": 1, "active_shards": 2, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 66.66666666666666 }
GET _cluster/nodes/stats { "error": { "root_cause": [ { "type": "index_not_found_exception", "reason": "no such index", "resource.type": "index_expression", "resource.id": "_cluster", "index": "_cluster" } ], "type": "index_not_found_exception", "reason": "no such index", "resource.type": "index_expression", "resource.id": "_cluster", "index": "_cluster" }, "status": 404 }
FIX?: Change
index_number_of_replicas
from 2 to 1. FIX?: ES was yellow status. the .kibana index was "unassigned". so i just wiped that. restarted kibana and it recreated the index and now green. Then Kibana > Service Actions > Load Template to restore what was lost in deleting the Kibana index. -
Install dependencies.
ssh root@y113 bash -c "' yum -y install krb5-server krb5-libs krb5-workstation '"
-
Define the host,
node1
, as the KDC.[root@y113 ~]# cat /etc/krb5.conf # Configuration snippets may be placed in this directory as well includedir /etc/krb5.conf.d/ [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] dns_lookup_realm = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true rdns = false default_realm = EXAMPLE.COM default_ccache_name = KEYRING:persistent:%{uid} [realms] EXAMPLE.COM = { kdc = y113.l42scl.hortonworks.com admin_server = y113.l42scl.hortonworks.com } [domain_realm] # .example.com = EXAMPLE.COM # example.com = EXAMPLE.COM ``` ``` cp -f /etc/krb5.conf /var/lib/ambari-server/resources/scripts
-
Do not copy/paste this full set of commands as the
kdb5_util
command will not run as expected. Run the commands individually to ensure they all execute. This step takes a moment. It creates the kerberos database.kdb5_util create -s service krb5kdc start chkconfig krb5kdc on service kadmin start chkconfig kadmin on
-
Setup the
admin
principals. You'llkinit
as themetron
principal when running topologies. Make sure to remember the passwords.kadmin.local -q "addprinc admin/admin"
-
Kerberize the cluster using the Ambari wizard.
-
Error: Unable to start Metron topologies.
``` 2772 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds [org.apache.storm.security.auth.kerberos.AutoTGT@748a654a] 2773 [main] INFO o.a.s.StormSubmitter - Running org.apache.storm.security.auth.kerberos.AutoTGT@748a654a Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The TGT found is not renewable at org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:103) at org.apache.storm.StormSubmitter.populateCredentials(StormSubmitter.java:94) at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:214) at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:310) at org.apache.storm.flux.Flux.runCli(Flux.java:171) at org.apache.storm.flux.Flux.main(Flux.java:98) Caused by: java.lang.RuntimeException: The TGT found is not renewable at org.apache.storm.security.auth.kerberos.AutoTGT.populateCredentials(AutoTGT.java:94) ... 5 more ``` FIX: On CentOS 7 the principals were not renewable by default.
- Parser topologies fail after starting successfully.
java.lang.IllegalStateException: Unable to initialize message writer at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:70) ~[stormjar.jar:?] at org.apache.metron.parsers.bolt.WriterBolt.prepare(WriterBolt.java:58) ~[stormjar.jar:?] at org.apache.storm.daemon.executor$fn__6573$fn__6586.invoke(executor.clj:798) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:482) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77] Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka producer at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:335) ~[stormjar.jar:?] at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:163) ~[stormjar.jar:?] at org.apache.metron.writer.kafka.KafkaWriter.init(KafkaWriter.java:173) ~[stormjar.jar:?] at org.apache.metron.writer.WriterToBulkWriter.init(WriterToBulkWriter.java:44) ~[stormjar.jar:?] at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:68) ~[stormjar.jar:?] ... 5 more Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.kafka.common.protocol.SecurityProtocol.PLAINTEXTSASL at java.lang.Enum.valueOf(Enum.java:238) ~[?:1.8.0_77] at org.apache.kafka.common.protocol.SecurityProtocol.valueOf(SecurityProtocol.java:28) ~[stormjar.jar:?] at org.apache.kafka.common.protocol.SecurityProtocol.forName(SecurityProtocol.java:89) ~[stormjar.jar:?] at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:79) ~[stormjar.jar:?] at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:277) ~[stormjar.jar:?] at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:163) ~[stormjar.jar:?] at org.apache.metron.writer.kafka.KafkaWriter.init(KafkaWriter.java:173) ~[stormjar.jar:?] at org.apache.metron.writer.WriterToBulkWriter.init(WriterToBulkWriter.java:44) ~[stormjar.jar:?] at org.apache.metron.parsers.bolt.WriterHandler.init(WriterHandler.java:68) ~[stormjar.jar:?] ... 5 more
As a side note, the setting for Kafka
security.inter.broker.protocol
is actuallyPLAINTEXTSASL
in Ambari.FIX: Altered
/var/lib/ambari-server/resources/common-services/METRON/0.4.0/package/scripts/params/params_linux.py
and setkafka_security_protocol = 'SASL_PLAINTEXT'
. This fixed the parser and indexing topologies.[root@y113 ~]# cat /var/lib/ambari-server/resources/common-services/METRON/0.4.0/package/scripts/params/params_linux.py | grep PLAINTEXT #kafka_security_protocol = config['configurations']['kafka-broker'].get('security.inter.broker.protocol', 'PLAINTEXT') kafka_security_protocol = 'SASL_PLAINTEXT'
FIX: Updated elasticsearch templates value in Ambari to fix indexing.
FIX: To fix the enrichment topology, I edited
/usr/metron/0.4.0/config/enrichment.properties
.[root@y136 0.4.0]# cat /usr/metron/0.4.0/config/enrichment.properties | grep PLAINTEXT kafka.security.protocol=SASL_PLAINTEXT
-
Bro data is now hitting Elasticsearch, but Kibana is not able to read it.
Error: [unsupported_operation_exception] custom format isn't supported at respond (http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:76378:16) at checkRespForFailure (http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:76341:8) at http://y113.l42scl.hortonworks.com:5000/bundles/kibana.bundle.js?v=9892:74959:8 at processQueue (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:42360:29) at http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:42376:28 at Scope.$eval (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43604:29) at Scope.$digest (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43415:32) at Scope.$apply (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:43712:25) at done (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:38161:48) at completeRequest (http://y113.l42scl.hortonworks.com:5000/bundles/commons.bundle.js?v=9892:38359:8)
FIX: Use ELASTICSEARCH_TEMPLATE_DELETE then ELASTICSEARCH_TEMPLATE_INSTALL.
-
Storm worker doesn't seem to start for Indexing topology.
2017-04-27 19:57:06.845 o.a.s.d.supervisor [INFO] Launching worker with command: '/usr/jdk64/jdk1.8.0_77/bin/java' '-cp' '/usr/hdp/2.5.3.0-37/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-rename-hack-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/asm-5.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.3.0-37/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.3.0-37/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/zookeeper.jar:/usr/hdp/2.5.3.0-37/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-core-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/current/storm-supervisor/conf:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/stormjar.jar:/etc/hbase/conf:/etc/hadoop/conf' '-Xmx64m' '-Dlogfile.name=worker.log' '-Dstorm.home=/usr/hdp/2.5.3.0-37/storm' '-Dworkers.artifacts=/var/log/storm/workers-artifacts' '-Dstorm.id=indexing-5-1493322512' '-Dworker.id=803c9e51-b6f9-4ee5-8265-b1ff2160fddf' '-Dworker.port=6704' '-Dstorm.log.dir=/var/log/storm' '-Dlog4j.configurationFile=/usr/hdp/2.5.3.0-37/storm/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' 'org.apache.storm.LogWriter' '/usr/jdk64/jdk1.8.0_77/bin/java' '-server' '-Xmx768m' '-Djava.security.auth.login.config=/usr/hdp/current/storm-supervisor/conf/storm_jaas.conf' '-javaagent:/usr/hdp/current/storm-client/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8650,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-client/contrib/storm-jmxetric/conf/jmxetric-conf.xml,process=Worker_6704_JVM' '-Djava.security.auth.login.config=/usr/metron/0.4.0/client_jaas.conf' '-Djava.library.path=/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/resources/Linux-amd64:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/resources:/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib' '-Dlogfile.name=worker.log' '-Dstorm.home=/usr/hdp/2.5.3.0-37/storm' '-Dworkers.artifacts=/var/log/storm/workers-artifacts' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/var/log/storm' '-Djava.io.tmpdir=/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf/tmp' '-Dlogging.sensitivity=S3' '-Dlog4j.configurationFile=/usr/hdp/2.5.3.0-37/storm/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.id=indexing-5-1493322512' '-Dworker.id=803c9e51-b6f9-4ee5-8265-b1ff2160fddf' '-Dworker.port=6704' '-cp' '/usr/hdp/2.5.3.0-37/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-rename-hack-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/asm-5.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.3.0-37/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.3.0-37/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.3.0-37/storm/lib/zookeeper.jar:/usr/hdp/2.5.3.0-37/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.3.0-37/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.3.0-37/storm/lib/storm-core-1.0.1.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.3.0-37/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/current/storm-supervisor/conf:/metron4/hadoop/storm/supervisor/stormdist/indexing-5-1493322512/stormjar.jar:/etc/hbase/conf:/etc/hadoop/conf' 'org.apache.storm.daemon.worker' 'indexing-5-1493322512' 'b8bd9356-c463-4182-b629-584429417b96' '6704' '803c9e51-b6f9-4ee5-8265-b1ff2160fddf' 2017-04-27 19:57:06.848 o.a.s.config [INFO] SET worker-user 803c9e51-b6f9-4ee5-8265-b1ff2160fddf metron 2017-04-27 19:57:06.848 o.a.s.d.supervisor [INFO] Creating symlinks for worker-id: 803c9e51-b6f9-4ee5-8265-b1ff2160fddf storm-id: indexing-5-1493322512 to its port artifacts directory 2017-04-27 19:57:06.850 o.a.s.d.supervisor [INFO] Creating symlinks for worker-id: 803c9e51-b6f9-4ee5-8265-b1ff2160fddf storm-id: indexing-5-1493322512 for files(1): ("resources") 2017-04-27 19:57:06.852 o.a.s.d.supervisor [INFO] Running as user:metron command:("/usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" "metron" "worker" "/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf" "/metron4/hadoop/storm/workers/803c9e51-b6f9-4ee5-8265-b1ff2160fddf/storm-worker-script.sh") 2017-04-27 19:57:06.853 o.a.s.util [WARN] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf:Invalid permissions on worker-launcher binary. 2017-04-27 19:57:06.853 o.a.s.util [WARN] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf:The configured nodemanager group 1009 is different from the group of the executable 0 2017-04-27 19:57:06.853 o.a.s.d.supervisor [INFO] Worker Process 803c9e51-b6f9-4ee5-8265-b1ff2160fddf exited with code: 22 2017-04-27 19:57:06.854 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started 2017-04-27 19:57:07.354 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started 2017-04-27 19:57:07.855 o.a.s.d.supervisor [INFO] 803c9e51-b6f9-4ee5-8265-b1ff2160fddf still hasn't started
nallen@HW12402 : ~ $ for host in y134 y135 y136 y137; do > ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" > done -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -rwxr-xr-x 1 root root 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher nallen@HW12402 : ~ $ ssh root@y136 "chmod 6550 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" nallen@HW12402 : ~ $ for host in y134 y135 y136 y137; do ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root root 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher nallen@HW12402 : ~ $ ssh root@y136 "chgrp hadoop /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" nallen@HW12402 : ~ $ for host in y134 y135 y136 y137; do ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-xr-x--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher nallen@HW12402 : ~ $ ssh root@y136 "chmod u+s /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" nallen@HW12402 : ~ $ for host in y134 y135 y136 y137; do ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-x--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher nallen@HW12402 : ~ $ ssh root@y136 "chmod g+s /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher" nallen@HW12402 : ~ $ for host in y134 y135 y136 y137; do ssh root@$host "ls -l /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher"; done -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher -r-sr-s--- 1 root hadoop 56800 Nov 30 02:58 /usr/hdp/2.5.3.0-37/storm/bin/worker-launcher
-
Indexing topology bombed with out of memory.
2017-04-27 21:21:17.467 o.a.s.d.worker [ERROR] Error when processing event java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_77] at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) ~[?:1.8.0_77] at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[?:1.8.0_77] at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[?:1.8.0_77] at org.apache.storm.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.protocol.TBinaryProtocol.writeByte(TBinaryProtocol.java:162) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.protocol.TBinaryProtocol.writeFieldBegin(TBinaryProtocol.java:123) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.BoltStats$BoltStatsStandardScheme.write(BoltStats.java:1060) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.BoltStats$BoltStatsStandardScheme.write(BoltStats.java:869) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.BoltStats.write(BoltStats.java:771) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ExecutorSpecificStats.standardSchemeWriteValue(ExecutorSpecificStats.java:216) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.TUnion.write(TUnion.java:152) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ExecutorStats$ExecutorStatsStandardScheme.write(ExecutorStats.java:798) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ExecutorStats$ExecutorStatsStandardScheme.write(ExecutorStats.java:648) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ExecutorStats.write(ExecutorStats.java:564) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ClusterWorkerHeartbeat$ClusterWorkerHeartbeatStandardScheme.write(ClusterWorkerHeartbeat.java:698) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ClusterWorkerHeartbeat$ClusterWorkerHeartbeatStandardScheme.write(ClusterWorkerHeartbeat.java:615) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.generated.ClusterWorkerHeartbeat.write(ClusterWorkerHeartbeat.java:535) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.thrift.TSerializer.serialize(TSerializer.java:79) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.serialization.GzipThriftSerializationDelegate.serialize(GzipThriftSerializationDelegate.java:40) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.utils.Utils.serialize(Utils.java:146) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at org.apache.storm.cluster$mk_storm_cluster_state$reify__5922.worker_heartbeat_BANG_(cluster.clj:465) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_77] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_77] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77] at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.7.0.jar:?] at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker$do_executor_heartbeats.doInvoke(worker.clj:76) ~[storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37] at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-1.7.0.jar:?] 2017-04-27 21:21:22.809 STDIO [ERROR] Halting due to Out Of Memory Error...Thread-16-kafkaSpout-executor[5 5]
FIX: Increased max heap space availabel to Storm workers. In Ambari,
worker.childopts = -Xmx2048m...
. -
Large number of unassigned shards in Elasticsearch. Elasticsearch
path_data = "/opt/lmm/es_data"
. This needs to use one of the data mounts on /metron1, /metron2, /metron3, and /metron4.FIX: Increased heap_size in Ambari/Elasticsearch settings from 128m to 4096m. I think it has to be quite a bit larger even from there.
FIX
for host in y135 y136 y137; do for mnt in /metron1 /metron2 /metron3 /metron4; do ssh root@$host bash -c "' mkdir -p $mnt/elasticsearch chown -R elasticsearch:elasticsearch $mnt/elasticsearch '" done done
Delete all indices;
DELETE /.kibana DELETE /bro* DELETE /yaf*
Then change the mount setting in Ambari to... Need brackets.
path_data = ["/metron1/elasticsearch","/metron2/elasticsearch","/metron3/elasticsearch","/metron4/elasticsearch"]
Last active
April 28, 2017 15:00
-
-
Save nickwallen/a3ab69d96e236bd8a068be10293ae175 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment