Faker - pip install Faker
Python
.
├── analyse_data.py
├── data
│ ├── airing.json
│ ├── channel.json
│ ├── program.json
│ ├── viewer.json
│ └── viewership.json
├── gen_data.py
├── helper
│ ├── Constant.py
│ ├── __init__.py
│ ├── IO.py
│ ├── Model.py
├── __init__.py
python gen_data.py
generates the test database fordata/
directory.python analyse._data.py
runs analysis on the generated database.helper
module contains IO.py, Model.py that facilitates disk read/write operation for json files and blueprint for database schema respectively.
-
To generate distinct test viewer names, Faker is used. Unique
vwr_id
is being generated usinguuid
module in Python. -
For each channel an
Airing
is generated by a random selection of the programs. This is carried for a month. Hence, a random combination of (channel, program) is created for a given time interval. -
To generate test json files (database), a Model class (
helper/Model.py
) is created for each table.
Most of the part is simple dictionary processing. Some key points are:
-
The
vship_data
(Viewership table) is sorted with key asview_dt
(view date). Next, the channels are filtered from the sorted data using the given time interval. (binary search paradigm) -
_get_vship_with_duration
: The filtered channels are used to search for the corresponding program details fromair_data
(Airing table). We store this information in a mapchn_prog_map
(channel_id -> (program_id, duration)). This also returns the viewer count for each channel as obtained from the viewership table. -
We've also created a map of
chn_id
- >chn_name
andpgm_id
-> (pgm_genres
,pgm_title
) for efficient access. -
For handling prime time, we implement the search on viewership table (first point) in time data with components smaller than a day (i.e hours, minutes etc).
- Analyse viewership pattern for each viewer. Basically, get the insights on the most popular genre that a viewer is interested in during a given time of a day. This analysis can throw some insights on the viewer's mood during a day. Knowing the mood of a viewer at a given time is crucial information.