Skip to content

Instantly share code, notes, and snippets.

@gregakespret
Last active September 26, 2018 05:51
Show Gist options
  • Save gregakespret/943414b3bf457bb9fc1d82d81bc3fbeb to your computer and use it in GitHub Desktop.
Save gregakespret/943414b3bf457bb9fc1d82d81bc3fbeb to your computer and use it in GitHub Desktop.
Celtra Data Engineering Challenge

Celtra Data Engineer Challenge

First of all, thank you for taking the time to do this challenge. There are many possible ways to solve this task. Your solution will help us gain insight into how you think, how much you care about technical aspects of software development and deployment, what architectural decisions you make, what standards of quality you adhere to and what tools and technologies you like to use and how you use them. Hopefully, we may be able to learn something from you, as well :)

Description

You are creating a low-latency reporting service that lets you generate adhoc reports. Primary use case for this service is a user-facing dashboard.

  • The service should support drill-downs and roll-ups
    • on dimensions: date, campaignId, campaignName, adId, adName and
    • on metrics: impressions, clicks, interactions, swipes, pinches, touches, uniqueUsers

For a query of "How many impressions were trafficked each day for each campaign?", an example response is (this is just an example, no need to use CSV):

date,campaignId,impressions
2017-01-01,camp0000,10000
2017-01-01,camp0001,9000
2017-01-01,camp0002,11000
2017-01-02,camp0000,50000
2017-01-02,camp0001,50000
2017-01-02,camp0002,60000
...
  • It should support/enforce two primary use cases:
    • specific campaign / all time.
    • all campaigns / last week.
  • Service should be able to scale horizontally with increased number of requests.
  • You can assume new data comes once per hour.

Data model

Each campaign has multiple ads. Those ads are shown to end-users through impressions. A user might interact with the ad (swipe, pinch, touch) and a user might click on an ad (but the majority of users will not). Relationships and maximum global cardinalities:

~1000 campaigns
campaign <-1..1e3-> ad <-1..1e7-> user <-1..10-> impression <-0..1-> swipe
                                                            <-0..1-> pinch
                                                            <-0..1-> touch
                                                            <-0..1-> click

Note that users are not disjunct between campaigns (same user might see ads in different campaigns).

Objective

You should:

  • design and implement low-latency and scalable architecture for the solution
  • design and implement the query REST API
  • develop comprehensive test suite
  • design and implement configuration management of:
    • source code, test data, database scripts, build and deploy scripts
    • third-party dependencies, libraries, and components
  • design a data model / schema
  • pick a suitable storage layout
  • prepare a working implementation with a nice README.md that has instructions on how to configure, build/test and run it (bonus points for self-sufficient container)

It is totally up to you what programming language you choose. You are free to use any open-source software if you wish, and your solution can also connect to any cloud vendors (you can check AWS Free Tier or GCP Free Tier).

Note: REST API should be just an interface to your service, no need to put too much attention on it. We are more interested in architecture of your service - is it fast enough for a user-facing dashboard? Can it be scaled with increased number of requests? How did you design data layout? Etc.

Submission

We expect you to start by providing a simple demo of your solution. You are free to choose on how you want to present it. Should it be a link to a public domain, self-sufficient container that we can run on our machines or something else, it is up to you. In any case we expect to get clean instructions on how to run and use the service and we should be able to start using it in short time and in just a couple of steps.

Within the demo, we expect to be able to:

  • Make a HTTP request to your service to get number of impressions, interactions, and swipes for each ad in a specific campaign (we expect you have some fake data)
  • Make another HTTP request to get number of uniqueUsers and impressions for each ad in the last week

Afterwards, we will dive into your solution implementation and architecture in more details.

If something is not clear or if you have any questions, please feel free to contact me via grega at celtra.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment