On the Vexing Difficulty of Evaluating IN Predicates

Authors:
Altan Birler, Thomas Neumann
Abstract

In SQL, IN predicates — and their syntactic cousin, quantified comparison predicates — are widely used and are part of the standard since its inception. Perhaps somewhat surprisingly, no efficient evaluation strategy is known. At first glance, these constructs behave like a semi join, but that is only true as long as no NULL values are involved. In the general case, the SQL standard mandates a behavior that is very difficult to implement efficiently. To the best of our knowledge, all existing systems either exhibit quadratic runtime in some cases or produce wrong results. This is rather troubling for such a common operation. In this paper, we study this problem and make both practical and theoretical contributions: On the theoretical side, we show that the problem is inherently hard, and that there is no hope of finding an algorithm with subquadratic runtime in general, unless SAT can be solved more efficiently. On the practical side we show that while the problem is hard in general, common cases can be solved in linear time, and that even the bad cases can usually be solved more efficiently. Compared to existing solutions, this leads to nearly arbitrary gains in runtime.