Type: Phone
He just straight jumped into code after talking a little about the extension I made for Twitch.
Given a a json object that has a start and stop offset, and severity, return a list of words that contain at least one character from any of the offsets, and have a severity above 5.
# --------
# JSON data:
# --------
data = {
"string" : "My name is Aaron and your name is Jehad",
"matches" : [
{
"start_pos" : 0,
"end_pos" : 2,
"severity" : 3
},
{
"start_pos" : 8,
"end_pos" : 10,
"severity" : 4
},
{
"start_pos" : 11,
"end_pos" : 15,
"severity" : 100
},
{
"start_pos" : 33,
"end_pos" : 37,
"severity" : 100
}
]
}
# -------
# should result in
#
#
# [
# "my" : true,
# "name" : true,
# "is" : true,
# "Aaron": false,
# "and" : true,
# "your" : true,
# "name" : true,
# "is" : true,
# "Jehad": false
# ]
Brute force solution approach worked out in brute-force.jpg, and then optimized apprach in optimized.jpg.
The brute force solution is O(nm) where n number of characters, and m number of severity entries in data.matches
:
- Go through sentence and make an array of
{ word: string, startIndex: number, endIndex: number }
. This is O(n) where n is number of letters indata.string
. - Then iterate through each of the words, and for each word iterate through all
data.matches
, while iterating check ifword.startIndex
andword.endIndex
overlap withdata.matches[i].start_pos
anddata.matches[i].end_pos
, if it does then push it to the answer list. You would think this would be O( [words] * [data.matches.length]) which is correct, but you would think O(words) isk
but no, the max words we can have is number of characters in sentence - the boundaries (space chars), so this is maybe right. But lets say every char was a word (not possible) and call it O(nm).
This approach guarntees we have to loop through words * data.matches.length, so this is the best case and worst case.
The optmized also comes out to be O(nm), where n
is number of characters, and m
is data.matches.length
. However its average case is not. Brute force gurantees O(nm), but in optimized, this average case is better.
What we do is make a map where the index is the index of the letter in the string, and the value is the word. So for sentence of "My name" we get map of:
{
0: 'My',
1: 'My',
3: 'name',
4: 'name',
5: 'name',
6: 'name'
}
Then we make our ans
be a set, so we cannot push duplicates. Then we iterate through data.matches.length
and in here we iterate j
from data.matches[i].start_pos
to data.matches[i].end_pos
, and push to our ans
set the value from wordByIndex[j]
. So we are actualy doing O(n) to get the wordByIndex
map, then O(mk) where k is the offset in each sev, but worst case offset is all characters, so k is n. So we say O(nm). But its not guranteed that we have to go through every word (and remember we said number of words is equal to number of chars in sentence in worst case), so the average case is better.