We need a source of PCAP data to ingest. In a production environment there is likely to be host(s) configured with one or more span port(s) that receives raw packet data from a packet aggregator device. To simulate this, we will use Metron's Pcap Replay service.
service pcap-replay start
Validate that that packet data is being replayed correctly.
[root@ip-10-0-0-189 ~]# sudo yum install -y tcpdump
...
[root@ip-10-0-0-189 ~]# tcpdump -i tap0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:56:12.203412 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 3107587296:3107588651, ack 2819468325, win 64240, length 1355
17:56:12.218129 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 1355:2710, ack 1, win 64240, length 1355
17:56:12.218476 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [.], seq 2710:4170, ack 1, win 64240, length 1460
17:56:12.218679 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 4170:4766, ack 1, win 64240, length 596
...
On the host(s) with the dedicated span port(s), we will use one of Metron's packet capture programs; Fastcapa or Pycapa. These programs are responsible for capturing the raw packet data off the wire and sending that data to Kafka where it can be ingested by Metron.
In this example, we are just going to use Pycapa, which is intended for low-volume testing only.
service pycapa start
If everything is working correctly, the raw packet data should be landing in a Kafka topic called pcap
. The data is binary, so it will look like a hot mess
[root@ip-10-0-0-189 ~]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh -z <zookeeper-host:2181> --topic pcap
...
E)���>K������P�"ssLQlJ
P��0
E( �@��x����>K���"PQlJ
ssLPF�
The next step is to have Metron process the pcap data and store it in HDFS. Start the PCAP topology to begin this process. A Storm topology called 'pcap' will be launched that consumes the raw pcap data from the Kafka topic and writes this data into sequence files in HDFS.
[centos@ip-10-0-0-53 ~]$ cd /usr/metron/0.3.0/
[centos@ip-10-0-0-53 0.3.0]$ pwd
/usr/metron/0.3.0
[centos@ip-10-0-0-53 0.3.0]$ bin/start_pcap_topology.sh
Running: /usr/jdk64/jdk1.8.0_77/bin/java -server -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.5.0.0-1245/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /usr/hdp/2.5.0.0-1245/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-core-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-rename-hack-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.0.0-1245/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.0.0-1245/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/zookeeper.jar:/usr/hdp/2.5.0.0-1245/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.0.0-1245/storm/lib/asm-5.0.3.jar org.apache.storm.daemon.ClientJarTransformerRunner org.apache.storm.hack.StormShadeTransformer /usr/metron/0.3.0/lib/metron-pcap-backend-0.3.0.jar /tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar
Running: /usr/jdk64/jdk1.8.0_77/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.5.0.0-1245/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.5.0.0-1245/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-core-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-rename-hack-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.0.0-1245/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.0.0-1245/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/zookeeper.jar:/usr/hdp/2.5.0.0-1245/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.0.0-1245/storm/lib/asm-5.0.3.jar:/tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.5.0.0-1245/storm/bin -Dstorm.jar=/tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar org.apache.storm.flux.Flux --remote /usr/metron/0.3.0/flux/pcap/remote.yaml --filter /usr/metron/0.3.0/config/pcap.properties
███████╗██╗ ██╗ ██╗██╗ ██╗
██╔════╝██║ ██║ ██║╚██╗██╔╝
█████╗ ██║ ██║ ██║ ╚███╔╝
██╔══╝ ██║ ██║ ██║ ██╔██╗
██║ ███████╗╚██████╔╝██╔╝ ██╗
╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝
+- Apache Storm -+
+- data FLow User eXperience -+
Version: 1.0.1
Parsing file: /usr/metron/0.3.0/flux/pcap/remote.yaml
636 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input stream...
638 [main] INFO o.a.s.f.p.FluxParser - Performing property substitution.
639 [main] INFO o.a.s.f.p.FluxParser - Not performing environment variable substitution.
907 [main] WARN o.a.s.f.FluxBuilder - Found multiple invokable methods for class class org.apache.metron.spout.pcap.SpoutConfig, method from, given arguments [END]. Using the last one found.
976 [main] INFO o.a.s.f.FluxBuilder - Detected DSL topology...
---------- TOPOLOGY DETAILS ----------
Topology Name: pcap
--------------- SPOUTS ---------------
kafkaSpout [1] (org.apache.metron.spout.pcap.KafkaToHDFSSpout)
---------------- BOLTS ---------------
--------------- STREAMS ---------------
--------------------------------------
1157 [main] INFO o.a.s.f.Flux - Running remotely...
1157 [main] INFO o.a.s.f.Flux - Deploying topology in an ACTIVE state...
1194 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -8340121339010421700:-4824301672672404920
1268 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds []
1343 [main] INFO o.a.s.StormSubmitter - Uploading topology jar /tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar to assigned location: /data1/hadoop/storm/nimbus/inbox/stormjar-49aedc3d-a259-409d-a96b-4b615ce07076.jar
1810 [main] INFO o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /data1/hadoop/storm/nimbus/inbox/stormjar-49aedc3d-a259-409d-a96b-4b615ce07076.jar
1820 [main] INFO o.a.s.StormSubmitter - Submitting topology pcap in distributed mode with conf {"topology.workers":1,"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-8340121339010421700:-4824301672672404920"}
2004 [main] INFO o.a.s.StormSubmitter - Finished submitting topology: pcap
Now that the data is being indexed and stored in HDFS, we can use the PCAP Query utility to extract just the pcap data that is of interest to us.
[root@ip-10-0-0-53 0.3.0]# bin/pcap_query.sh query -q "ip_src_addr == '192.168.138.158'" -st 500
16/11/23 18:05:26 INFO impl.TimelineClientImpl: Timeline service address: http://ec2-35-164-205-21.us-west-2.compute.amazonaws.com:8188/ws/v1/timeline/
16/11/23 18:05:26 INFO client.RMProxy: Connecting to ResourceManager at ec2-35-164-205-105.us-west-2.compute.amazonaws.com/10.0.0.216:8050
16/11/23 18:05:27 INFO client.AHSProxy: Connecting to Application History server at ec2-35-164-205-21.us-west-2.compute.amazonaws.com/10.0.0.215:10200
16/11/23 18:05:31 INFO input.FileInputFormat: Total input paths to process : 1
16/11/23 18:05:31 INFO mapreduce.JobSubmitter: number of splits:1
16/11/23 18:05:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479919136662_0002
16/11/23 18:05:32 INFO impl.YarnClientImpl: Submitted application application_1479919136662_0002
16/11/23 18:05:33 INFO mapreduce.Job: The url to track the job: http://ec2-35-164-205-105.us-west-2.compute.amazonaws.com:8088/proxy/application_1479919136662_0002/
16/11/23 18:05:33 INFO mapreduce.Job: Running job: job_1479919136662_0002
16/11/23 18:05:40 INFO mapreduce.Job: Job job_1479919136662_0002 running in uber mode : false
16/11/23 18:05:40 INFO mapreduce.Job: map 0% reduce 0%
16/11/23 18:05:51 INFO mapreduce.Job: map 100% reduce 0%
16/11/23 18:05:57 INFO mapreduce.Job: map 100% reduce 10%
16/11/23 18:05:58 INFO mapreduce.Job: map 100% reduce 20%
16/11/23 18:05:59 INFO mapreduce.Job: map 100% reduce 30%
16/11/23 18:06:00 INFO mapreduce.Job: map 100% reduce 50%
16/11/23 18:06:02 INFO mapreduce.Job: map 100% reduce 70%
16/11/23 18:06:03 INFO mapreduce.Job: map 100% reduce 80%
16/11/23 18:06:04 INFO mapreduce.Job: map 100% reduce 90%
16/11/23 18:06:05 INFO mapreduce.Job: map 100% reduce 100%
16/11/23 18:06:05 INFO mapreduce.Job: Job job_1479919136662_0002 completed successfully
16/11/23 18:06:05 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=61461
FILE: Number of bytes written=1687976
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=589683
HDFS: Number of bytes written=64840
HDFS: Number of read operations=34
HDFS: Number of large read operations=0
HDFS: Number of write operations=20
Job Counters
Launched map tasks=1
Launched reduce tasks=10
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=23994
Total time spent by all reduces in occupied slots (ms)=187785
Total time spent by all map tasks (ms)=7998
Total time spent by all reduce tasks (ms)=62595
Total vcore-milliseconds taken by all map tasks=7998
Total vcore-milliseconds taken by all reduce tasks=62595
Total megabyte-milliseconds taken by all map tasks=9829542
Total megabyte-milliseconds taken by all reduce tasks=76929255
Map-Reduce Framework
Map input records=851
Map output records=337
Map output bytes=60614
Map output materialized bytes=61461
Input split bytes=196
Combine input records=0
Combine output records=0
Reduce input groups=337
Reduce shuffle bytes=61461
Reduce input records=337
Reduce output records=337
Spilled Records=674
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1327
CPU time spent (ms)=17780
Physical memory (bytes) snapshot=2675085312
Virtual memory (bytes) snapshot=29835042816
Total committed heap usage (bytes)=1673527296
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=589487
File Output Format Counters
Bytes Written=64840
The utility created a libpcap-compliant pcap file in the current working directory.
[root@ip-10-0-0-53 0.3.0]# ls -l
total 72
drwxr-xr-x. 2 livy games 4096 Nov 22 22:36 bin
drwxr-xr-x. 3 livy games 4096 Nov 23 17:10 config
drwxr-xr-x. 2 livy games 4096 Sep 29 17:44 ddl
drwxr-xr-x. 6 livy games 4096 Aug 22 14:54 flux
drwxr-xr-x. 2 root root 4096 Nov 23 17:07 lib
drwxr-xr-x. 2 livy games 4096 Nov 22 22:36 patterns
-rw-r--r--. 1 root root 48506 Nov 23 18:06 pcap-data-20161123180607184+0000.pcap
[root@ip-10-0-0-53 0.3.0]# file pcap-data-20161123180607184+0000.pcap
pcap-data-20161123180607184+0000.pcap: tcpdump capture file (little-endian) - version 2.4 (Ethernet, capture length 65535)
This file can be opened with any third-party tool that supports libpcap-compliant pcap files. For example, open Wireshark and choose File > Open
and Wireshark will load the pcap file.
Wireshark Screenshot