Please submit your answer in the form of a private github gist.
We expect this task to take no more than 3 hours.
Please create a process which can download a sample data set of data from the following location and append it in a database table.
https://data.cms.gov/provider-data/archived-data/doctors-clinicians
Criteria:
- Your process should be able to get the file as an environment variable, named DATASET, and download extract and import the related file:
For example,DATASET= doctors_and_clinicians_02_2022.zip
will only importdoctors_and_clinicians_02_2022.zip
file - Your process should be able to accept a list of columns to import, as
IMPORT_FIELDS
, and only import the mentioned fields from the file. - Import destination can be any DB of your choice. For example, you can start a SQLite, PostgreSQL, or MySQL in a docker container and import the data there.
- You can pick you language of choice.
- You do not need to write the whole process in one language. Likewise, you can divide the tasks between shell scripts and programs of your choice if you needed.
- You need to dockerize your solution.
- Eventual result must be executable like:
$ export DATASET=doctorsandclinicians022022.zip $ export IMPORTFIELDS="NPI,IndPACID,IndenrlID" $ etl.sh starting to download... processessing... importing... done. 2000 records imported. $ `
What were are looking for:
- You can design and implement an ETL system.
- Furthermore, you are familiar with docker.
- You are familiar with general concepts of databases, and you can interact with them.