tushar-rishav/alphonso.md

## alphonso.md

      
    Raw
  

              alphonso.md
            
          
    Dependencies

Faker - pip install Faker
Implementation Details

Programming language used:


Python

Files:

.
├── analyse_data.py
├── data
│   ├── airing.json
│   ├── channel.json
│   ├── program.json
│   ├── viewer.json
│   └── viewership.json
├── gen_data.py
├── helper
│   ├── Constant.py
│   ├── __init__.py
│   ├── IO.py
│   ├── Model.py
├── __init__.py


python gen_data.py generates the test database for data/ directory.
python analyse._data.py runs analysis on the generated database.
helper module contains IO.py, Model.py that facilitates disk read/write operation for json files and
blueprint for database schema respectively.

Logic:

Part 1: Generate test database.


To generate distinct test viewer names, Faker is used.
Unique vwr_id is being generated using uuid module in Python.


For each channel an Airing is generated by a random selection of the programs. This is carried for a month. Hence, a random
combination of (channel, program) is created for a given time interval.


To generate test json files (database), a Model class (helper/Model.py) is created for each table.


Part 2. Process the test database.

Most of the part is simple dictionary processing. Some key points are:


The vship_data(Viewership table) is sorted with key as view_dt(view date). Next, the channels are filtered from the sorted data using the given time interval. (binary search paradigm)


_get_vship_with_duration: The filtered channels are used to search for the corresponding program details from air_data (Airing table). We store this information in a map chn_prog_map (channel_id -> (program_id, duration)). This also returns the viewer count for each channel as obtained from the viewership table.


We've also created a map of chn_id - > chn_name and pgm_id -> (pgm_genres, pgm_title) for efficient access.


For handling prime time, we implement the search on viewership table (first point) in time data with components smaller than a day (i.e hours, minutes etc).


New insights (proposal)


Analyse viewership pattern for each viewer. Basically, get the insights on the most popular genre that a viewer is interested in during a given time of a day. This analysis can throw some insights on the viewer's mood during a day. Knowing the mood of a viewer at a given time is crucial information.