Skip to content

Instantly share code, notes, and snippets.

@nickwallen
Last active November 23, 2016 18:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nickwallen/f5760c79f816cc81fe46e46b73a5cb60 to your computer and use it in GitHub Desktop.
Save nickwallen/f5760c79f816cc81fe46e46b73a5cb60 to your computer and use it in GitHub Desktop.

Generate PCAP

We need a source of PCAP data to ingest. In a production environment there is likely to be host(s) configured with one or more span port(s) that receives raw packet data from a packet aggregator device. To simulate this, we will use Metron's Pcap Replay service.

service pcap-replay start

Validate that that packet data is being replayed correctly.

[root@ip-10-0-0-189 ~]# sudo yum install -y tcpdump
...
[root@ip-10-0-0-189 ~]# tcpdump -i tap0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:56:12.203412 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 3107587296:3107588651, ack 2819468325, win 64240, length 1355
17:56:12.218129 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 1355:2710, ack 1, win 64240, length 1355
17:56:12.218476 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [.], seq 2710:4170, ack 1, win 64240, length 1460
17:56:12.218679 IP insconsulting.ru.http > ip-192-168-138-158.us-west-2.compute.internal.49206: Flags [P.], seq 4170:4766, ack 1, win 64240, length 596
...

Capture PCAP

On the host(s) with the dedicated span port(s), we will use one of Metron's packet capture programs; Fastcapa or Pycapa. These programs are responsible for capturing the raw packet data off the wire and sending that data to Kafka where it can be ingested by Metron.

In this example, we are just going to use Pycapa, which is intended for low-volume testing only.

service pycapa start

If everything is working correctly, the raw packet data should be landing in a Kafka topic called pcap. The data is binary, so it will look like a hot mess

[root@ip-10-0-0-189 ~]# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh -z <zookeeper-host:2181> --topic pcap
...
E)���>K������P�"ssLQlJ
                      P��0
E(	�@��x����>K���"PQlJ
                           ssLPF�

Process PCAP

The next step is to have Metron process the pcap data and store it in HDFS. Start the PCAP topology to begin this process. A Storm topology called 'pcap' will be launched that consumes the raw pcap data from the Kafka topic and writes this data into sequence files in HDFS.

[centos@ip-10-0-0-53 ~]$ cd /usr/metron/0.3.0/

[centos@ip-10-0-0-53 0.3.0]$ pwd
/usr/metron/0.3.0

[centos@ip-10-0-0-53 0.3.0]$ bin/start_pcap_topology.sh
Running: /usr/jdk64/jdk1.8.0_77/bin/java -server -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.5.0.0-1245/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /usr/hdp/2.5.0.0-1245/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-core-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-rename-hack-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.0.0-1245/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.0.0-1245/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/zookeeper.jar:/usr/hdp/2.5.0.0-1245/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.0.0-1245/storm/lib/asm-5.0.3.jar org.apache.storm.daemon.ClientJarTransformerRunner org.apache.storm.hack.StormShadeTransformer /usr/metron/0.3.0/lib/metron-pcap-backend-0.3.0.jar /tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar
Running: /usr/jdk64/jdk1.8.0_77/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.5.0.0-1245/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.5.0.0-1245/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-core-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/objenesis-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/storm-rename-hack-1.0.1.2.5.0.0-1245.jar:/usr/hdp/2.5.0.0-1245/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.5.0.0-1245/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.5.0.0-1245/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.5.0.0-1245/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.5.0.0-1245/storm/lib/zookeeper.jar:/usr/hdp/2.5.0.0-1245/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.5.0.0-1245/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.5.0.0-1245/storm/lib/asm-5.0.3.jar:/tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.5.0.0-1245/storm/bin -Dstorm.jar=/tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar org.apache.storm.flux.Flux --remote /usr/metron/0.3.0/flux/pcap/remote.yaml --filter /usr/metron/0.3.0/config/pcap.properties
███████╗██╗     ██╗   ██╗██╗  ██╗
██╔════╝██║     ██║   ██║╚██╗██╔╝
█████╗  ██║     ██║   ██║ ╚███╔╝
██╔══╝  ██║     ██║   ██║ ██╔██╗
██║     ███████╗╚██████╔╝██╔╝ ██╗
╚═╝     ╚══════╝ ╚═════╝ ╚═╝  ╚═╝
+-         Apache Storm        -+
+-  data FLow User eXperience  -+
Version: 1.0.1
Parsing file: /usr/metron/0.3.0/flux/pcap/remote.yaml
636  [main] INFO  o.a.s.f.p.FluxParser - loading YAML from input stream...
638  [main] INFO  o.a.s.f.p.FluxParser - Performing property substitution.
639  [main] INFO  o.a.s.f.p.FluxParser - Not performing environment variable substitution.
907  [main] WARN  o.a.s.f.FluxBuilder - Found multiple invokable methods for class class org.apache.metron.spout.pcap.SpoutConfig, method from, given arguments [END]. Using the last one found.
976  [main] INFO  o.a.s.f.FluxBuilder - Detected DSL topology...
---------- TOPOLOGY DETAILS ----------
Topology Name: pcap
--------------- SPOUTS ---------------
kafkaSpout [1] (org.apache.metron.spout.pcap.KafkaToHDFSSpout)
---------------- BOLTS ---------------
--------------- STREAMS ---------------
--------------------------------------
1157 [main] INFO  o.a.s.f.Flux - Running remotely...
1157 [main] INFO  o.a.s.f.Flux - Deploying topology in an ACTIVE state...
1194 [main] INFO  o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -8340121339010421700:-4824301672672404920
1268 [main] INFO  o.a.s.s.a.AuthUtils - Got AutoCreds []
1343 [main] INFO  o.a.s.StormSubmitter - Uploading topology jar /tmp/d5f844e8b1a611e6a6d10a0a570e5f4d.jar to assigned location: /data1/hadoop/storm/nimbus/inbox/stormjar-49aedc3d-a259-409d-a96b-4b615ce07076.jar
1810 [main] INFO  o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /data1/hadoop/storm/nimbus/inbox/stormjar-49aedc3d-a259-409d-a96b-4b615ce07076.jar
1820 [main] INFO  o.a.s.StormSubmitter - Submitting topology pcap in distributed mode with conf {"topology.workers":1,"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-8340121339010421700:-4824301672672404920"}
2004 [main] INFO  o.a.s.StormSubmitter - Finished submitting topology: pcap

Query PCAP

Now that the data is being indexed and stored in HDFS, we can use the PCAP Query utility to extract just the pcap data that is of interest to us.

[root@ip-10-0-0-53 0.3.0]# bin/pcap_query.sh query -q "ip_src_addr == '192.168.138.158'" -st 500
16/11/23 18:05:26 INFO impl.TimelineClientImpl: Timeline service address: http://ec2-35-164-205-21.us-west-2.compute.amazonaws.com:8188/ws/v1/timeline/
16/11/23 18:05:26 INFO client.RMProxy: Connecting to ResourceManager at ec2-35-164-205-105.us-west-2.compute.amazonaws.com/10.0.0.216:8050
16/11/23 18:05:27 INFO client.AHSProxy: Connecting to Application History server at ec2-35-164-205-21.us-west-2.compute.amazonaws.com/10.0.0.215:10200
16/11/23 18:05:31 INFO input.FileInputFormat: Total input paths to process : 1
16/11/23 18:05:31 INFO mapreduce.JobSubmitter: number of splits:1
16/11/23 18:05:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479919136662_0002
16/11/23 18:05:32 INFO impl.YarnClientImpl: Submitted application application_1479919136662_0002
16/11/23 18:05:33 INFO mapreduce.Job: The url to track the job: http://ec2-35-164-205-105.us-west-2.compute.amazonaws.com:8088/proxy/application_1479919136662_0002/
16/11/23 18:05:33 INFO mapreduce.Job: Running job: job_1479919136662_0002
16/11/23 18:05:40 INFO mapreduce.Job: Job job_1479919136662_0002 running in uber mode : false
16/11/23 18:05:40 INFO mapreduce.Job:  map 0% reduce 0%
16/11/23 18:05:51 INFO mapreduce.Job:  map 100% reduce 0%
16/11/23 18:05:57 INFO mapreduce.Job:  map 100% reduce 10%
16/11/23 18:05:58 INFO mapreduce.Job:  map 100% reduce 20%
16/11/23 18:05:59 INFO mapreduce.Job:  map 100% reduce 30%
16/11/23 18:06:00 INFO mapreduce.Job:  map 100% reduce 50%
16/11/23 18:06:02 INFO mapreduce.Job:  map 100% reduce 70%
16/11/23 18:06:03 INFO mapreduce.Job:  map 100% reduce 80%
16/11/23 18:06:04 INFO mapreduce.Job:  map 100% reduce 90%
16/11/23 18:06:05 INFO mapreduce.Job:  map 100% reduce 100%
16/11/23 18:06:05 INFO mapreduce.Job: Job job_1479919136662_0002 completed successfully
16/11/23 18:06:05 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=61461
		FILE: Number of bytes written=1687976
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=589683
		HDFS: Number of bytes written=64840
		HDFS: Number of read operations=34
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=20
	Job Counters
		Launched map tasks=1
		Launched reduce tasks=10
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=23994
		Total time spent by all reduces in occupied slots (ms)=187785
		Total time spent by all map tasks (ms)=7998
		Total time spent by all reduce tasks (ms)=62595
		Total vcore-milliseconds taken by all map tasks=7998
		Total vcore-milliseconds taken by all reduce tasks=62595
		Total megabyte-milliseconds taken by all map tasks=9829542
		Total megabyte-milliseconds taken by all reduce tasks=76929255
	Map-Reduce Framework
		Map input records=851
		Map output records=337
		Map output bytes=60614
		Map output materialized bytes=61461
		Input split bytes=196
		Combine input records=0
		Combine output records=0
		Reduce input groups=337
		Reduce shuffle bytes=61461
		Reduce input records=337
		Reduce output records=337
		Spilled Records=674
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=1327
		CPU time spent (ms)=17780
		Physical memory (bytes) snapshot=2675085312
		Virtual memory (bytes) snapshot=29835042816
		Total committed heap usage (bytes)=1673527296
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=589487
	File Output Format Counters
		Bytes Written=64840

The utility created a libpcap-compliant pcap file in the current working directory.

[root@ip-10-0-0-53 0.3.0]# ls -l
total 72
drwxr-xr-x. 2 livy games  4096 Nov 22 22:36 bin
drwxr-xr-x. 3 livy games  4096 Nov 23 17:10 config
drwxr-xr-x. 2 livy games  4096 Sep 29 17:44 ddl
drwxr-xr-x. 6 livy games  4096 Aug 22 14:54 flux
drwxr-xr-x. 2 root root   4096 Nov 23 17:07 lib
drwxr-xr-x. 2 livy games  4096 Nov 22 22:36 patterns
-rw-r--r--. 1 root root  48506 Nov 23 18:06 pcap-data-20161123180607184+0000.pcap

[root@ip-10-0-0-53 0.3.0]# file pcap-data-20161123180607184+0000.pcap
pcap-data-20161123180607184+0000.pcap: tcpdump capture file (little-endian) - version 2.4 (Ethernet, capture length 65535)

Using the PCAP File

This file can be opened with any third-party tool that supports libpcap-compliant pcap files. For example, open Wireshark and choose File > Open and Wireshark will load the pcap file.

Wireshark Screenshot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment