Skip to content

Instantly share code, notes, and snippets.

@EC7495
Last active June 24, 2020 19:52
Show Gist options
  • Save EC7495/b24d92c8f94cd24b4887b16d2530c29a to your computer and use it in GitHub Desktop.
Save EC7495/b24d92c8f94cd24b4887b16d2530c29a to your computer and use it in GitHub Desktop.
String Search (needle in haystack)

class: center, middle

String Search

aka find the needle in the haystack. (Similar to JavaScript's indexOf but we are searching for an entire word as opposed to a single character)


Prompt

You are attempting to find the index of the first appearance of one string (the needle) inside of another (the haystack). You can think of the problem as looking for a particular substring inside another string and returning the starting index of that substring.


Examples

indexOf('or', 'hello world'); // should return 7 because the string "or" can be found in the string "hello world" starting at index 7
indexOf('oox', 'ooboxoooxo'); // should return 6
indexOf('howdy', 'hello world'); // should return -1
indexOF('neddle', 'neddle'); // should return 0 (the needle and haystack can potentially be the same string!)
indexOf('hello world', 'or'); // should return -1 (notice how the needle is longer than the haystack)

Common approaches

  1. built-in methods
  • Most students' first instincts will be to use built-in string methods like indexOf(), includes() or substring(). indexOf() is FORBIDDEN. Try to steer your interviewee away from the other methods like includes() and substring().

  • The reason is that many whiteboard interviews will be language-agnostic and focus on the underlying concepts. You will want to show that you understand how these methods work, not that you happened to read the right documentation the night before.

  • Also, you may actually be adding more time complexity to your solution. Look into how indexOf(), includes() and substring() work under the hood. Many built-in methods actually operate on O(n) time, or worse!

  1. split and loop
  • A lot of students will split the haystack into an array of characters and then loop through.

  • This approach would work. But imagine the space complexity of generating a new array and then holding it in memory for a very, very large haystack. You would be introducing another O(n) dimension in time and space, where n is the length of the haystack.

  • If they're in a groove, have them finish out this approach and pseudocode it. Then, ask them how they would do this without generating a second copy of the haystack. Meaning, how can they achieve the same thing in "one pass" (only looping once) and without using extra space?
  1. double for loop (this is the approach of the solution below)
  • Loop through the haystack and the needle at the same time.

  • For every character in the haystack, loop through the needle to check if that is the potential start of the needle

Solution

function indexOf(needle, haystack) {
  for (let hIdx = 0; hIdx <= haystack.length - needle.length; hIdx++) {
    for (let nIdx = 0; nIdx < needle.length; nIdx++) {
      if (haystack[hIdx + nIdx] !== needle[nIdx]) break;
      if (nIdx + 1 === needle.length) return hIdx;
    }
  }
  return -1;
}

Big O

// let n be the length of the needle
// let h be the length of the haystack

function indexOf(needle, haystack) {

    // looping through haystack takes O(h) time
    for (let hIdx = 0; hIdx <= haystack.length - needle.length; hIdx++) {
    // looping through needle takes O(n) time
        for (let nIdx = 0; nIdx < needle.length; nIdx++) {
            if (haystack[hIdx + nIdx] !== needle[nIdx]) break; // comparisons take O(1) time
            if (nIdx + 1 === needle.length) return hIdx; // O(1) time
        }
    }

    return -1;

// Time complexity: O(h * n)
// Space complexity: O(1)
}

Other solutions:

// let h be the length of the haystack
// let n be the length of the needle

const indexOf = (needle, haystack) => {

  let nLen = needle.length, sub

  for (let i = 0; i <= haystack.length - needle.length; i++) { // O(h)
    sub = haystack.slice(i, nLen + i) // O(n)
    if (sub === needle) return i // O(1)
  }

  return -1

// Time complexity: O(h * n)
// Space complexity: O(1)
}
// let h be the length of the haystack.
// let n be the length of the needle.

const indexOf = (needle, haystack) => {

  // if the needle is longer than the haystack,
  // then it's definitely not in there.
  if (needle.length > haystack.length) return -1

  // we want to match as many characters as the needle is long.
  let toMatch = needle.length
  // these are the pointers we will use to search for the needle.
  // start will be the starting index of the substring we are currently looking at in the haystack.
  // nIdx will be the index of the char we are currently looking at in the needle.
  // hIdx will be the index of the charr we are currently looking at in the haystack.
  let start = 0, nIdx = 0, hIdx = 0

  // loop until we have matched all characters
  //  or until we searched the whole haystack.
  while (toMatch && hIdx < haystack.length) { // O(h)
    // if the current char in the haystack
    // is the same as the current char in the needle,
    // keep looking at subsequent characters to find more matches,
    // and decrement the amount of characters we need to match
    if (haystack[hIdx] === needle[nIdx]) { // O(1)
      hIdx++
      nIdx++
      toMatch--
    } 
    
    // if the current char at the haystack is not
    // the same as the current char in the needle,
    // then we need to start over ...
    // this time, we will start looking from our previous starting index + 1.
    else { // O(1)
      hIdx = ++start
      nIdx = 0
      toMatch = needle.length
    }
  }

  // if we matched all characters,
  // then the current value of start will 
  // be the starting index of the needle
  // in the haystack.
  // else, we didn't match all characters in the needle and
  // therefore we did not find it.
  return !toMatch ? start : -1

  // Time complexity: O(h)
  // Space complexity: O(1)
}

Video Solution Matt Mintzer There are other algorithms, such as Boyer-Moore (well, modified slightly), that can perform at O(n+m) time—or even faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment