Define two sets of intervals to be equivalent relative to a set of numbers (the base) if both sets' intervals cover the exact same set of numbers in the base. For example, given the base [0, 1, 3, 4, 6], {[0, 1], [3, 6]} is equivalent to {[0, 3], [4, 6]} but not to {[1, 3], [4, 6]} because the latter doesn't cover 0.

Define a set of intervals to be maximal relative to a base if there's no equivalent set with fewer intervals. Equivalently, a set of intervals is maximal iff, for any two intervals in the set, there exists at least one number in the base between the upper bound of one interval and the lower bound of the other.

For example, given the base [0, 1, 3, 4, 6], the set of ranges {[0, 1], [4, 6]} is maximal because 3 is between 1 and 4, but {[0, 1], [3, 6]} is not because there's nothing in the base between 1 and 3.

Level 1

You are given a (potentially very large) ordered list of floats—the base—and a list of intervals, each of which may be open or closed. Return a maximal set of intervals that's equivalent to the input list. These intervals should be ordered first by minimum, then by maximum.

List<Interval> maximize(List<float> base, List<Interval> intervals);

Note: In practice, the input of this algorithm comes from humans and the output is intended to be viewed by them. While not strictly necessary for correctness, this means that there's a lot of value in preserving the original bounds of the ranges where possible rather than expanding them eagerly.

Best solution so far

For simplicity, we assume all intervals are open and that no interval properly contains any other. We can easily generalize this avoid these assumptions.

maximize(base, intervals):
  sort intervals

  result = []
  previous = first element of intervals
  for interval in intervals[1..-1]:
    mid = binary search base for the least upper bound of previous
    if mid is null or below interval:
      append previous to result
      previous = interval
    else:
      previous = [previous.min, interval.max]

  append previous to result
  return result

This is O(|intervals|⋅log(|base|)) (technically it's O(|intervals|⋅log(|intervals|) + |intervals|⋅log(|base|)) but |intervals| << |base|).

Level 2

The same base will be used with many different lists of intervals. See if you can make the algorithm more efficient by pre-processing the base somehow. Note that because the base may be very large, it would be great if maximize() weren't O(|base|).

class Maximizer {
  Maximizer(List<float> base);

  List<Interval> maximize(List<Interval> intervals);
}

Best solution so far

We make the same simplifying assumptions as above.

class Maximizer:
  upperBounds = empty map

  maximize(base, intervals):
    sort intervals

    result = []
    previous = first element of intervals
    for interval in intervals[1..-1]:
      if upperBounds contains key interval.max:
        mid = upperBounds[previous.max]
      else:
        mid = binary search base for the least upper bound of previous
        upperBounds[previous.max] = mid

      if mid is null: # previous has no upper bound in base
        break loop
      else if mid is null or below interval:
        append previous to result
        previous = interval
      else:
        previous = [previous.min, interval.max]

    append previous to result
    return result

This is O(|intervals|⋅log(|base|)) for all-new intervals, but if all the intervals are from outputs of maximize() (which they will often be in practice), it's only O(|intervals|⋅log(|intervals|)). In addition, the number of binary searches necessary scales with the number of new intervals only, meaning for example adding a single new interval to a maximized set is O(|intervals|⋅log(|intervals|) + log(|base|)).

Level 3

We often end up merging the results of previous calls to maximize(). See if you can make the algorithm more efficient by avoiding re-checking intervals that are known to already be part of the same maximized set. We represent this by passing a List<List<Interval>>, where each sub-list is guaranteed to be maximized.

class Maximizer {
  Merger(List<float> values);

  List<Interval> maximize(List<List<Interval>> intervalLists);
}

Bonus

I also want to perform intersection and difference operations on lists of intervals. I suspect that as long as the inputs to these operations are maximized, the outputs from naive algorithms will be as well. Either prove that this is true, or give a counterexample and provide union() and difference() algorithms that preserve maximization.