nmestrada/stringSearch.md

## stringSearch.md

      
    Raw
  

              stringSearch.md
            
          
    Substring Search

Prompt

Write a function, indexOf, which takes in two strings--the needle and the haystack--and returns the index of the first appearance of the needle in the haystack. If the needle does not appear in the haystack, return -1. Don't use indexOf(), includes() or substring(),
indexOf('or', 'hello world'); // should return 7
indexOf('hello world', 'or'); // should return -1
indexOf('howdy', 'hello world'); // should return -1
indexOf('oox', 'ooboxoooxo'); // should return 6


Why no built-ins?

It probably makes sense that we're asking you not to use indexOf(), but why do we ask that you not use other built-in methods like includes() or substring()?
First, many whiteboard interviews will be language-agnostic-- they focus on the underlying concepts. You will want to show that you understand how these methods work, not that you happened to read the right documentation the night before.
Second, in a whiteboard context, you're very often trying to demonstrate your understanding of time and space complexity and a capacity to write code that is minimally time- and space-complex. Relying on includes() and substring() might make your solution more terse, but do you know the time and space complexity of these built-in methods? It can sometimes be acceptable to use a built-in if you know their complexity, but if you don't, you'll be introducing an unknown into your code. (includes() and substring(), FYI, are usually O(n), but it depends on the Javascript engine.)
Limits of Method-Chaining

It's not uncommon when starting out with this problem to split the string into an array-- we all know and love our array methods. We might start with: haystack.split(), and then perhaps chain more methods on the end or loop through the resulting array. But this means we'll be using a bunch of extra memory (O(n)'s worth of memory, at least) and be chaining our solution behind a built-in with O(n) time complexity (split() has to go through the entire string!).
When dealing with string-based problems, we'll often want to avoid having to generate intermediate arrays, instead preferring to access characters in a string directly (say, with pointers advanced via a for loop). Chaining m methods which each have time complexity O(n) results in a time complexity of m * O(n). We generally drop the constants when considering worst-case time complexity-- but, still, less than ideal.
Nested For Loop Solution

Given these constraints, the simplest solution uses nested for loops. The outer loop goes through the haystack; the inner loops a pointer through the needle. The inner loop compares the needle to the haystack, character by character, starting from the outer loop's pointer, breaking as soon it encounters a character which doesn't match; if it gets to the end of the needle, it returns the outer loop's pointer:
const indexOf = (needle, haystack) => {
  for (let h = 0; h <= haystack.length - needle.length; h++) {
    for (let n = 0; n < needle.length; n++) {
      if (haystack[h + n] !== needle[n]) break;
      if (n + 1 === needle.length) return h;
    }
  }
  return -1;
}
Where n is the haystack size and m the needle size, this solution is O(n*m). Why?
const indexOf = (needle, haystack) => {

  // O(n * ...) where n is the number of letters in haystack
  for (let h = 0; h <= haystack.length - needle.length; h++) {
  
    // O(m * ...) where m is the number of letters in needle
    for (let n = 0; n < needle.length; n++) {
    
      // O(1) constant
      if (haystack[h + n] !== needle[n]) break;
      
      // O(1) constant
      if (n + 1 === needle.length) return h;
    }
  }
  
  // O(1) constant
  return -1;
}
So, O(n * (m * (1 + 1)))=> O(n * m)
Other solutions

There are some famous algorithms which beat O(n * m). It would be quite unusual to expect an interviewee to invent these algorithms at a whiteboard, but for some kinds of jobs there may be an expectation of familiarity with them. Teaching these is beyond the scope of our time, today, but here are some resources:
The Boyer-Moore algorithm
The KMP algorithm
The Z algorithm