Skip to content

Instantly share code, notes, and snippets.

@nikitaDanilenko
Last active April 30, 2020 09:50
Show Gist options
  • Save nikitaDanilenko/6caa5c8f38c9230c7ee6c145f2a5b7c9 to your computer and use it in GitHub Desktop.
Save nikitaDanilenko/6caa5c8f38c9230c7ee6c145f2a5b7c9 to your computer and use it in GitHub Desktop.
Traverse one, query the other

Occasionally, one may have an operation where one needs to combine two data structures, where the full traversal is linear in the number of elements in the structure, and a query operation is logarithmic (with an arbitrary base) in the number of elements in the structure. One use case for such a combination is the definition of an intersection of maps as in Scala, where there is currently no general merge operation for this case.

The question

If we traverse one structure while querying the other - is it cheaper (in terms of complexity) to traverse the larger one or the smaller one?

The answer

Traverse the smaller one while querying the larger one, as soon as the number of elements in the smaller one is larger than b.

For the explanation let us assume a logarithmic base of b > 0. It is likey that b = 2 holds, but it is not necessary. Consider the function f = t -> t / log(b, t). Using its first derivative gives us f' = t -> (log(b, t) - 1) / (log(b, t)^2), and thus it f'(t) > 0 for all t > b, hence the function is monotonically increasing in the interval [b, infinity[. In particular this means that for all x, y <- [b, infinity[ such that x <= y we have f(x) <= f(y) and thus x / log(b, x) <= y / log(b, y), which means that x * log(b, y) <= y * log(b, x). The latter inequality displays the asserted statement.

What about the complementary case?

If one of the structures has less elements than b, it is safe to assume that querying this structure takes a constant number of operations. Here, one needs to consider the precise costs and compare these costs in the case of the various different sizes. However, it is likely that the complementary case is a corner case, since b is probably less than 10. Hence, even the possibly larger cost of traversing the smaller structure while querying the larger one is good enough to avoid a more complex and less clear implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment