This script reads ride data points from a CSV file and stream calculated fare estimations for each ride into another CSV File.
You need these tools installed on your system:
After installing these install the dependencies with:
dep ensure
(I tried to add minimum dependecies to this project so the installation is quite easy)
You need to give the input and output file paths either as flags or environment variables.
$ beat -input-file ./paths.csv -output-file ./fares.csv
To run the test suite (units and integration), use:
$ make
For coverage summary, use:
$ make cover
And to run end-to-end tests, use:
$ make e2e
The main thinking behind this code is its a streaming pipeline. That reads the data from an input stream. (It only has CSV file reader right now. But it's easily possible to add more input or output backends like Kafka and SQS). Calculate the fares with a few concurrent go routines (one per ride) and stream the fares to the output.
[ Input ] -> [ io/Reader
] -> [Runner
]
x---------------------------------- [Job
] --------\
x---------------------------------- [Job
]------------> [Writer
] ----> [ Output ]
x---------------------------------- [Job
]---------/
I extensively used channels to handle the concurrency. Although the results are really good memory usage wise. It may perform better If there was a some kind of buffering mechanism to buffer the input before processing. Although, Since the calculations is quite fast it may not change at all. In my testings I could process 1.2GB of data in about 50 seconds which is pretty reasonable.