Skip to content

Instantly share code, notes, and snippets.

@anzebrvar
Last active March 20, 2020 06:54
Show Gist options
  • Save anzebrvar/6b137727997c1e20bcd67c92666cbafd to your computer and use it in GitHub Desktop.
Save anzebrvar/6b137727997c1e20bcd67c92666cbafd to your computer and use it in GitHub Desktop.
Celtra Analytics Challenge

Celtra Analytics Engineer Challenge

First of all, thank you for taking the time to do this challenge. There are many possible ways to solve this task. Your solution will help us gain insight into how you think, how much you care about technical aspects of software development and deployment, what architectural decisions you make, what standards of quality you adhere to and what tools and technologies you like to use and how you use them. Hopefully, we may be able to learn something from you, as well :)

Description

You are creating a low-latency reporting service that lets you generate adhoc reports. Primary use case for this service is a user-facing dashboard.

  • The service should support drill-downs and roll-ups
    • on dimensions: date, campaignId, campaignName, adId, adName and
    • on metrics: impressions, clicks, interactions, swipes, pinches, touches, uniqueUsers

For a query of "How many impressions were trafficked each day for each campaign?", an example response is (this is just an example, no need to use CSV):

date,campaignId,impressions
2017-01-01,camp0000,10000
2017-01-01,camp0001,9000
2017-01-01,camp0002,11000
2017-01-02,camp0000,50000
2017-01-02,camp0001,50000
2017-01-02,camp0002,60000
...
  • It should support/enforce two primary use cases:
    • specific campaign / all time.
    • all campaigns / last week.
  • Additionally it should support selecting any combination of metric or dimensions mentioned above.
  • Service should be able to scale horizontally with increased number of requests.
  • You can assume new data comes once per hour.

Data model

Each campaign has multiple ads. Those ads are shown to end-users through impressions. A user might interact with the ad (swipe, pinch, touch) and a user might click on an ad (but the majority of users will not). Relationships and maximum global cardinalities:

~1000 campaigns
campaign <-1..1e3-> ad <-1..1e7-> user <-1..10-> impression <-0..1-> swipe
                                                            <-0..1-> pinch
                                                            <-0..1-> touch
                                                            <-0..1-> click

Note that users are not disjunct between campaigns (same user might see ads in different campaigns).

Objective

You should:

  • design and implement low-latency and scalable architecture for the solution
  • design and implement the query REST API
  • develop comprehensive test suite
  • design and implement configuration management of:
    • source code, test data, database scripts, build and deploy scripts
    • third-party dependencies, libraries, and components
  • design a data model / schema
  • pick a suitable storage layout
  • prepare a working implementation with a nice README.md that has instructions on how to configure, build/test and run it (bonus points for self-sufficient container)

It is totally up to you what programming language you choose. You are free to use any open-source software if you wish, and your solution can also connect to any cloud vendors (you can check AWS Free Tier or GCP Free Tier).

Note: REST API should be just an interface to your service, no need to put too much attention on it. We are more interested in architecture of your service - is it fast enough for a user-facing dashboard? Can it be scaled with increased number of requests? How did you design data layout? Etc.

Submission

We expect you to start by providing a simple demo of your solution. You are free to choose on how you want to present it. Should it be a link to a public domain, self-sufficient container that we can run on our machines or something else, it is up to you. In any case we expect to get clean instructions on how to run and use the service and we should be able to start using it in short time and in just a couple of steps.

Within the demo, we expect to be able to:

  • Make a HTTP request to your service to get number of impressions, interactions, and swipes for each ad in a specific campaign (we expect you have some fake data)
  • Make another HTTP request to get number of uniqueUsers and impressions for each ad in the last week
  • Make another HTTP request to get any combination of metrics and dimensions, for example (but not limited to!):
    • impressions, pinches, camapignName, date
    • impressions
    • impressions, campaignName, date, adName

Afterwards, we will dive into your solution implementation and architecture in more details.

If something is not clear or if you have any questions, please feel free to contact me via anze.brvar at celtra.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment