Define two sets of intervals to be equivalent relative to a set of numbers (the base) if both sets' intervals cover the exact same set of numbers in the base. For example, given the base [0, 1, 3, 4, 6]
, {[0, 1], [3, 6]}
is equivalent to {[0, 3], [4, 6]}
but not to {[1, 3], [4, 6]}
because the latter doesn't cover 0
.
Define a set of intervals to be maximal relative to a base if there's no equivalent set with fewer intervals. Equivalently, a set of intervals is maximal iff, for any two intervals in the set, there exists at least one number in the base between the upper bound of one interval and the lower bound of the other.
For example, given the base [0, 1, 3, 4, 6]
, the set of ranges {[0, 1], [4, 6]}
is maximal because 3
is between 1
and 4
, but {[0, 1], [3, 6]}
is not because there's nothing in the base between 1
and 3
.
You are given a (potentially very large) ordered list of floats—the base—and a list of intervals, each of which may be open or closed. Return a maximal set of intervals that's equivalent to the input list. These intervals should be ordered first by minimum, then by maximum.
List<Interval> maximize(List<float> base, List<Interval> intervals);
Note: In practice, the input of this algorithm comes from humans and the output is intended to be viewed by them. While not strictly necessary for correctness, this means that there's a lot of value in preserving the original bounds of the ranges where possible rather than expanding them eagerly.
For simplicity, we assume all intervals are open and that no interval properly contains any other. We can easily generalize this avoid these assumptions.
maximize(base, intervals):
sort intervals
result = []
previous = first element of intervals
for interval in intervals[1..-1]:
mid = binary search base for the least upper bound of previous
if mid is null or below interval:
append previous to result
previous = interval
else:
previous = [previous.min, interval.max]
append previous to result
return result
This is O(|intervals|⋅log(|base|))
(technically it's O(|intervals|⋅log(|intervals|) + |intervals|⋅log(|base|))
but |intervals| << |base|
).
The same base will be used with many different lists of intervals. See if you can make the algorithm more efficient by pre-processing the base somehow. Note that because the base may be very large, it would be great if maximize()
weren't O(|base|)
.
class Maximizer {
Maximizer(List<float> base);
List<Interval> maximize(List<Interval> intervals);
}
We make the same simplifying assumptions as above.
class Maximizer:
upperBounds = empty map
maximize(base, intervals):
sort intervals
result = []
previous = first element of intervals
for interval in intervals[1..-1]:
if upperBounds contains key interval.max:
mid = upperBounds[previous.max]
else:
mid = binary search base for the least upper bound of previous
upperBounds[previous.max] = mid
if mid is null: # previous has no upper bound in base
break loop
else if mid is null or below interval:
append previous to result
previous = interval
else:
previous = [previous.min, interval.max]
append previous to result
return result
This is O(|intervals|⋅log(|base|))
for all-new intervals, but if all the intervals are from outputs of maximize()
(which they will often be in practice), it's only O(|intervals|⋅log(|intervals|))
. In addition, the number of binary searches necessary scales with the number of new intervals only, meaning for example adding a single new interval to a maximized set is O(|intervals|⋅log(|intervals|) + log(|base|))
.
We often end up merging the results of previous calls to maximize()
. See if you can make the algorithm more efficient by avoiding re-checking intervals that are known to already be part of the same maximized set. We represent this by passing a List<List<Interval>>
, where each sub-list is guaranteed to be maximized.
class Maximizer {
Merger(List<float> values);
List<Interval> maximize(List<List<Interval>> intervalLists);
}
I also want to perform intersection and difference operations on lists of intervals. I suspect that as long as the inputs to these operations are maximized, the outputs from naive algorithms will be as well. Either prove that this is true, or give a counterexample and provide union()
and difference()
algorithms that preserve maximization.
I'm not entirely sure whether I'm thinking straight at the end of the day, but do I have these assumptions correct?
values
comes pre-sorted.A
andB
in the output list, there is at least one integeri
that lies "between"A
andB
which is invalues
but not in any of the intervals.I'm not sure if that second assumption is justified, but it's at least plausible to me. I'll go with it for now. :)
My first thought is to start by ordering the
intervals
by their left endpoints. Ifvalues
were a list of consecutive integers, we could easily tell whether we can merge two consecutive intervals, and do something like this.(Insert caveat about untested code here.)
Of course, the
values
don't have to be consecutive, so this code won't produce the optimal solution. For example, it fails to merge the intervals[1,3)
and[4, 7)
even whenvalues
misses the number3
. But in this case, the intervals[1, 3)
and[1, 4)
are actually the same as far as we're concerned, as they both contain the samevalues
. If we began with[1, 4)
instead of[1, 3)
, we'd have gotten the right answer.I think we might be able to use this idea to find an efficient solution in general. Given an interval of the form
[a, b)
, letWe might as well replace
[a, b)
with[aa, bb)
since they contain the samevalues
. Although I've written their definitions inefficiently, I'm pretty sure we can find the new endpointsaa
andbb
with binary search. So we can preprocess all the intervals into this form using onlyO(k log n)
steps, wherek
is the number of intervals andn
the number of integers.Assuming I haven't made an "ass out of u and me", we can now apply the same algorithm I described above for consecutive-integer values. If
A
andB
are consecutive intervals in the left-endpoint ordering and the right endpoint ofA
is strictly smaller than the left endpoint ofB
, then there is in fact an integer invalues
that is missed byA
andB
, showing that they can't be merged. The total runtime of this algorithm should beO(k (log k + log n))
, a pretty efficient solution to at least Level 1.