Skip to content

Instantly share code, notes, and snippets.

@calumhalcrow
Last active November 26, 2018 05:30
Show Gist options
  • Save calumhalcrow/5963bcfee5e87c3a495c109779edfe2f to your computer and use it in GitHub Desktop.
Save calumhalcrow/5963bcfee5e87c3a495c109779edfe2f to your computer and use it in GitHub Desktop.
Rapid River Data Engineer skills test

PLEASE NOTE: We'd prefer you send your answers to workwithus@rrsoft.co rather than post them publicly in another Gist, or in comments on this Gist ;-)

1. Voting

Each voter can vote in zero or more referenda. Each referendum has one or more questions, and each question is a yes/no vote. Write the simplest normalized schema to describe this in generic SQL statements or as an entity-relationship diagram. Point out where the indexes would be if you want to quickly know the results of a given referendum question, but you never expect to query a single voter's voting record.

2. Cursors

Explain the difference between explicit and implicit cursors in PL/SQL. Give an example in which an explicit cursor can be used to solve a problem that cannot be solved (or cannot be solved easily) using implicit cursors.

3. Unix tools

In one Unix command, find all of the files in /usr/local whose contents contain the word "aardvark" (case-insensitive), and list them all in order from most-recently created to least-recently created.

4. Remote Development Environments

A client has asked you to solve some problems on their large (1 TB) database located in San Francisco, CA but you live in Indonesia. Your Internet bandwidth is about 1 Mbps, and it is a bit unreliable.

Discuss the pros and cons of interacting with the client's database using:

  • An SSH session
  • A Remote Desktop/Citrix session
  • A local copy of the database
  • A slimmed-down copy of the database (which does not currently exist)

Also discuss tools that you would use to mitigate any shortcomings of your setup.

How does the answer change if your bandwidth is 100 Kbps? 100 MBps?

5. Lies, damn lies, and git

You're working on a cool branch of the foobar project, and your branch contains two commits A and B. The git lineage is:

X -- Y -- Z <-- master
           \
            A -- B <-- your-cool-branch

You want to contribute the code back to the master foobar repo, but you realize there is a really dumb typo in one of your source code comments in commit A. You'd still like to submit the pull request as two commits A' and B', where A' is the fixed version of A, and B' is the same exact diff as B. How do you rewrite git history to make this happen? Is B' the same hash as B? Why or why not?

6. Scala: A Simple Calculator

In Scala, write a method called calculator that accepts three string parameters:

def calculator(operand1: String, operator: String, operand2: String): Unit

and demonstrate that it behaves as follows

  • Converts the operands to Ints;
  • Performs the desired mathematical operator (+, -, *, or /) on the two operands
  • Prints the result, or a generic error message

7. Query Optimization

Consider the following SQL query:

Select a. * , b. * , c. * , d. *
    from table_a a left join (select *, case when column_1b_sub > 0 then 1 else 0 end as column_1b_sub_value from table_b) b on a.column_1a = b.column_1b and a.column_2a = b.column_2b
    left join (select * from table_d) d    on a.column_1a = b.column_1b and a.column_2a = b.column_2b

Identify any syntactic problems with this query. With those problems fixed, how could you modify this query to make it more efficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment