jballanc/interview.md

## interview.md

      
    Raw
  

              interview.md
            
          
    Personally, when conducting technical interviews, I try to keep the interviewee right at the limits of their knowledge and ability. It's sort of like standing blindfolded in a room and walking in different directions until you hit a wall to get a sense of the size of the room (don't try this at home). So this often involves starting overly complex, then walking back until you find the interviewee's comfort zone. Let them demonstrate their knowledge for a while (and hopefully gain back some confidence), then push them out to the complexity zone again until they stop answering/start looking uncomfortable again.
To give a concrete example of what that would look like with this question (let's call "A" the interviewer and "B" the interviewee):
A: Thanks for coming in today! We here at CompuMegaCorp are glad you decided to interview with us, and personally, I hope you're enjoying the day so far...
B: Uh...yeah (shifts nervously in seat)
A: Great! So, for the projects you'd be working on here, we make heavy use of databases...of all types...and to be really successful you'll have to get comfortable working with them. Have you worked with databases in the past?
B: Oh, sure...first time was when I wrote this PHP site in high school and had to set up MySQL first. I used a tool called something like...MySQLPHPAdminsomething?
A: phpMyAdmin?
B: Yeah, that was it!
A: Cool. So, tell me, have you ever created an index in a database?
B: Oh yeah! I remember another project I worked on at my last gig. We were building a list of businesses and I needed to pull them up based on the Owner's name. It was working fine, until the data team went and dumped a list of half a million new businesses in our DB. Suddenly my code started crawling...until my manager told me to add an index. Then, BAM! Problem solved.
A: Heh, yeah. Indexes are good at that. So, can you tell me how a database index works?
B: You mean like how to create one?
A: Well, that will depend on the specific database, right?
B: I...guess?
A: I mean that each database has a slightly different way to create indexes, and NoSQL databases will be different than SQL databases.
B: Oh yeah, I get what you mean.
A: I was more asking about how it is that the index solved your problem?
B: You mean, how did it suddenly make my query faster?
A: Precisely!
B: Hmm...umm...I (shifts nervously again)... I guess I'm not sure?
A: That's fine. We can figure it out together.
B: Eh?
A: Well, you mentioned that you were querying by the business owner's name, and that was slow, but then you added an index and it was fast. If we break that down, I think we can figure out how a database index works.
B: Uh...ok
A: So, why do you think your query was slow?
B: Well...hmm...I mean we added a bunch of new businesses.
A: Ok. What order were they added in?
B: Uh, well, whatever order we got them in.
A: Right. So the query was slow because...
B: Oh! Sure. Because the database has to look at every business to find the ones with a certain owner.
A: Right, and when you insert the records as they come in, their order is...?
B: Random?
A: I think you've got it now. So going from slow to fast?
B: Yeah. Ok, so when we insert the records, they're randomized with respect to owner's name. Searching a random list is slow, but searching a sorted list is fast. So adding an index creates a list of the records sorted by what we want to query. In my case, sorted by the owner's name.
A: Exactly! So tell me, how would you implement that?
B: Heh... (feeling a bit more confident now)... Well, I guess I'd take the whole list of businesses, look at the owner name for each, along with it's primary key ID. Then I'd sort the whole collection using a standard sorting method like quicksort, and write out a 2D array, with the owner names in sorted order in the first dimension, and the primary key ID for each in the second. That way, when I needed to query by owner name I could do a simple binary search, then look at the second dimension for the matching position in the array to get the primary key ID, and viola!
A: Ok, so now you have your database, with all its random records, and your index, which lists the records in an order that makes querying fast. Let's say we need to add a new business.
B: Ok.
A: What happens to the index?
B: Hmm...well, it would have to stay sorted.
A: Right. But with your array?
B: Oh... (suddenly realizing the magnitude of the mistake, head droops) uh.
A: So, maybe not an array?
B: Uh...probably not?
A: Why?
B: Well, inserting into the middle of an array is going to mean potentially moving a lot of data.
A: Right. So what's a better data structure?
B: Hmm...I could insert into a linked list by just moving a pointer.
A: True. What's wrong with a linked list?
B: Um...not...sure?
A: Think back. Why were we creating this list?
B: To make the query fast.
A: And the query was fast because?
B: A sorted list can be searched quickly with a binary search.
A: But a binary search requires what property of a data type?
B: Umm...hmmm... (really thinking now) OH! Of course! You need to be able to jump to the item that's half-way between two others. Which you can't do with a linked list because there's no way to know the distance between two items in the list without walking all the intervening nodes.
A: Right! So what data type could we use so that we can do fast binary searches, but also add new items without having to move all the data?
B: AH HAH! A Tree! (smiling again)
A: Yup! Nicely done. And that's how database indexes work. So tell me, can you implement a B*-tree in C using only XOR?
B: Huh?
A: Heh, nevermind...it seems we're out of time. It's been great talking with you!
B: Likewise!