@inproceedings{sitchinava-ipdps11, author = "Deepak Ajwani and Nodari Sitchinava and Norbert Zeh", title = "{I/O}-optimal distribution sweeping on private-cache chip multiprocessors", booktitle = "Proceedings of the 26th IEEE International Parallel \& Distributed Processing Symposium", series = "IPDPS '11", pages = "1114-1123", year = 2011, month = 5, doi = "10.1109/IPDPS.2011.106", abstract = "The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of $O(\mathrm{sort}_P(N) + K/PB)$ for a number of problems on axis aligned objects; $P$ denotes the number of cores/processors, $B$ denotes the number of elements that fit in a cache line, $N$ and $K$ denote the sizes of the input and output, respectively, and $\mathrm{sort}_P(N)$ denotes the I/O complexity of sorting $N$ items using $P$ processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest." }