@inproceedings{sitchinava-spaa24,
  author = "Nodari Sitchinava and Rolf Svenning",
  title  = "The all nearest smaller values problem revisited in practice, parallel and external memory",
  booktitle = "Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures",
  series = "SPAA '24",
  pages = "259--268",
  year = 2024,
  month = 6,
  doi = "10.1145/3626183.3659979",
  software = {https://github.com/algoparc/ANSV},
  abstract = "We present a thorough investigation of the \emph{All Nearest Smaller Values (ANSV)} problem from a practical perspective.  The ANSV problem is defined as follows: given an array $A$ consisting of $n$ values, for each entry $A_i$ compute the largest index $l < i$ and the smallest index $r > i$ such that $A_i > A_l$ and $A_i > A_r$, i.e., the indices of the nearest smaller values to the left and to the right of $A_i$.  The ANSV problem was solved by Berkman, Schieber, and Vishkin [J. Algorithms, 1993] in the PRAM model. Their solution in the CREW PRAM model, which we will refer to as the \emph{BSV} algorithm, achieves optimal $\mathcal{O}\!\left(n\right)$ work and $\mathcal{O}\!\left(\log n\right)$ span.  Until now, the BSV algorithm has been perceived as too complicated for practical use, and we are not aware of any publicly available implementations.  Instead, the best existing practical solution to the ANSV problem is the implementation by Shun and Zhao presented at DCC'13. They implemented a simpler $\mathcal{O}\!\left(n\log n\right)$-work algorithm with an additional heuristic first proposed by Blelloch and Shun at ALENEX'11. We refer to this implementation as the \emph{BSZ} algorithm.  In this paper, we implement the original BSV algorithm and demonstrate its practical efficiency. Despite its perceived complexity, our results show that its performance is comparable to the BSZ algorithm.  We also present the first theoretical analysis of the heuristic implemented in the BSZ algorithm and show that it provides a tunable trade-off between optimal work and optimal span. In particular, we show that it achieves $\mathcal{O}\!\left(n\left(1 + \frac{\log{n}}{k}\right)\right)$ work and $\mathcal{O}\!\left(k(1+\log{\frac{n}{k}})\right)$ span, for any integer parameter $1 \le k \le n$. Thus, for $k = \Theta\!\left(\log n\right)$, the BSZ algorithm can be made to be work-optimal, albeit at the expense of increased span compared to BSV.  Our discussion includes a detailed examination of different input types, particularly highlighting that for random inputs, the low expected distance between values and their nearest smaller values renders simple algorithms efficient.  Finally, we analyze the input/output (I/O) complexities of the BSV algorithm."
}