Takes an input file path as variables file_location
Perferable the single function should be devided into three sub modules.
- Reading the file and extracting one by one line e.g. read_file
- Function which takes one line as input and then clean the line and enrich the output dictionary (can be made another seperate function for real time counter)
- Outputting function. Which can display the output either on stdout or into a specific file with desired formatting.
Total time took to program the code 17 minutes (14 mins (dev) and 3 mins (test)) and writing documentation 10 mins.
Having above stated approach as quite some advantages one of them being the concurrent code. If there is a function which takes one by one line then it can be parallelized and at the same time using gevent or asyncio be done concurrently. Henceforth in line processing function will be mapper and then instead of enriching the output dictionary one by one we can simple create a reduce function as mentioned above.
Unit test having a testing file. Sample input line cases are stated below followed by the output
- a sample-line working - not so fine sample-line => a (1) sample-line (2) working (1) not (1) so (1) fine (1)
-
- , * , . Are all punctuations[marks] and are all not valid => are (2) all (2) punctuations (1) marks (1) and (1) not (1) valid (1)
- another brick in 1 wall @ pink-floyd;wall => another (1) brick (1) in (1) wall (1) pink-floyd (1) wall (1)