Noitidart/Twitch-SDE_Intern-Feb 14th 2019.md

## Twitch-SDE_Intern-Feb 14th 2019.md

      
    Raw
  

              Twitch-SDE_Intern-Feb 14th 2019.md
            
          
    Generic Software Engineer Internship

Type: Phone
He just straight jumped into code after talking a little about the extension I made for Twitch.
Question

Given a a json object that has a start and stop offset, and severity, return a list of words that contain at least one character from any of the offsets, and have a severity above 5.
Example

# --------
# JSON data:
# --------
data = {
  "string" : "My name is Aaron and your name is Jehad",
  "matches" : [
    {
      "start_pos" : 0,
      "end_pos" : 2,
      "severity" : 3
    },
    {
      "start_pos" : 8,
      "end_pos" : 10,
      "severity" : 4
    },
    {
      "start_pos" : 11,
      "end_pos" : 15,
      "severity" : 100
    },
    {
      "start_pos" : 33,
      "end_pos" : 37,
      "severity" : 100
    }
  ]
}

# -------
# should result in
#
#
# [
#     "my"   : true,
#     "name" : true,
#     "is"   : true,
#     "Aaron": false,
#     "and"  : true,
#     "your" : true,
#     "name" : true,
#     "is"   : true,
#     "Jehad": false
# ]


Answer

Brute force solution approach worked out in brute-force.jpg, and then optimized apprach in optimized.jpg.
Brute Force

The brute force solution is O(nm) where n number of characters, and m number of severity entries in data.matches:

Go through sentence and make an array of { word: string, startIndex: number, endIndex: number }. This is O(n) where n is number of letters in data.string.
Then iterate through each of the words, and for each word iterate through all data.matches, while iterating check if word.startIndex and word.endIndex overlap with data.matches[i].start_pos and data.matches[i].end_pos, if it does then push it to the answer list. You would think this would be O( [words] * [data.matches.length]) which is correct, but you would think O(words) is k but no, the max words we can have is number of characters in sentence - the boundaries (space chars), so this is maybe right. But lets say every char was a word (not possible) and call it O(nm).

This approach guarntees we have to loop through words * data.matches.length, so this is the best case and worst case.
Optimized

The optmized also comes out to be O(nm), where n is number of characters, and m is data.matches.length. However its average case is not. Brute force gurantees O(nm), but in optimized, this average case is better.
What we do is make a map where the index is the index of the letter in the string, and the value is the word. So for sentence of "My name" we get map of:
{
    0: 'My',
    1: 'My',
    3: 'name',
    4: 'name',
    5: 'name',
    6: 'name'
}

Then we make our ans be a set, so we cannot push duplicates. Then we iterate through data.matches.length and in here we iterate j from data.matches[i].start_pos to data.matches[i].end_pos, and push to our ans set the value from wordByIndex[j]. So we are actualy doing O(n) to get the wordByIndex map, then O(mk) where k is the offset in each sev, but worst case offset is all characters, so k is n. So we say O(nm). But its not guranteed that we have to go through every word (and remember we said number of words is equal to number of chars in sentence in worst case), so the average case is better.