Data Engineering Capstone Project -- Bryan Bischof
Dec. 17, 2015
Given unstructured log data from Aspera's ASCP transfer, one needs to parse these logs, and store them to a large key-value store(currently Redis). The current solution is a Python script that runs a series of regexes, and is deployed on Spark to a Mesos cluster for analysis. However, this script is highly inefficient and isn't designed to interact with a lambda architecture. In particular, it doesn't connect to a permanent data store, and second, it doesn't accept incoming streams, only batch upload and processing.
This project is to rewrite this script to do three things:
- pure scala implementation of these hundred-so regexs