@inproceedings{DBLP:conf/vldb/AlsabtiRS97, author = {Khaled Alsabti and Sanjay Ranka and Vineet Singh}, editor = {Matthias Jarke and Michael J. Carey and Klaus R. Dittrich and Frederick H. Lochovsky and Pericles Loucopoulos and Manfred A. Jeusfeld}, title = {A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data}, booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece}, publisher = {Morgan Kaufmann}, year = {1997}, isbn = {1-55860-470-7}, pages = {346-355}, ee = {db/conf/vldb/AlsabtiRS97.html}, crossref = {DBLP:conf/vldb/97}, bibsource = {DBLP, http://dblp.uni-trier.de} }

The phi-quantile of an ordered sequence of data values is the element with rank phi*n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical problems. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) are constant per quantile.

We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.

*Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.*

- Download PDF file (www.vldb.org, Darmstadt, Germany)
- Download PDF file (www.acm.org, New York, USA)

- Windows: Click the letter of your CD drive

A B C**D E**F G H I J K L M N O P Q R S T U V W X Y Z - Mac: Click here
- UNIX/LINUX: mount the CD and click on the path of your
*mount point*:

/Anthology/vldb8997 or /cdrom

- Windows: Click the letter of your CD drive

A B C**D E**F G H I J K L M N O P Q R S T U V W X Y Z - Mac: Click here
- UNIX/LINUX: mount the DVD and click on the path of your
*mount point*:

/Anthology/aDVD1 or /dvd

Contents

- [AIS93]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216
- [ALSS95]
- Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. VLDB 1995: 490-501
- [ARS97]
- ...
- [AS95]
- Rakesh Agrawal, Arun N. Swami: A One-Pass Space-Efficient Algorithm for Finding Quantiles. COMAD 1995: 0-
- [AS96]
- Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12
- [B+72]
- Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan: Time Bounds for Selection. J. Comput. Syst. Sci. 7(4): 448-461(1973)
- [Bat68]
- Kenneth E. Batcher: Sorting Networks and Their Applications. AFIPS Spring Joint Computing Conference 1968: 307-314
- [Coc77]
- William G. Cochran:
Sampling Techniques, 3rd Edition.
John Wiley 1977, ISBN 0-471-16240-X

- [D+91]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291
- [FR75]
- Robert W. Floyd, Ronald L. Rivest: Expected Time Bounds for Selection. Commun. ACM 18(3): 165-172(1975)
- [GS90]
- ...
- [JC85]
- Raj Jain, Imrich Chlamtac: The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations. Commun. ACM 28(10): 1076-1085(1985)
- [KGGK94]
- Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis:
Introduction to Parallel Computing.
Benjamin/Cummings 1994, ISBN 0-8053-3170-0

- [Koo80]
- Robert Kooi:
The Optimization of Queries in Relational Databases.
Ph.D. thesis, Case Western Reserve University 1980

- [LLS+93]
- Xiaobo Li, Paul Lu, Jonathan Schaeffer, John Shillington, Pok Sze Wong, Hanmao Shi: On the Versatility of Parallel Sorting by Regular Sampling. Parallel Computing 19(10): 1079-1103(1993)
- [MD88]
- M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36
- [MP80]
- J. Ian Munro, Mike Paterson: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12: 315-323(1980)
- [PIHS96]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
- [Ps84]
- Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
- [SD77]
- ...
- [Zip49]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949