This is Conta Stone's data challenge for intern applicants. The objective is to extract and analyze data from a database.
The solution can be developed using Python, SQL scripts, a BI tool or a combination of those.
It must be hosted in a public code repository such as GitHub and GitLab, or sent as a compressed .zip
folder including all the necessary files to replicated your environment and run your code.
The database is available here and contains credit card transactional data in 4 tables:
customers
cards
transactions
frauds
- Extract and analyze the data in order to answer the following questions. Provide a description and/or comments for each solution.
- What is the average
age
of the customers in the database? - How is the
card_family
ranked based on thecredit_limit
given to each card? - For the transactions flagged as fraud, what are the
id
s of the transactions with the highest value?
- Analysis:
- Analyze whether or not the fraudulent transactions are somehow associated to other features in the dataset. Explain your results.
It is not mandatory to answer all questions.