ACM SIGMOD Anthology VLDB dblp.uni-trier.de

A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data.

Khaled Alsabti, Sanjay Ranka, Vineet Singh: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. VLDB 1997: 346-355
@inproceedings{DBLP:conf/vldb/AlsabtiRS97,
  author    = {Khaled Alsabti and
               Sanjay Ranka and
               Vineet Singh},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {A One-Pass Algorithm for Accurately Estimating Quantiles for
               Disk-Resident Data},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {346-355},
  ee        = {db/conf/vldb/AlsabtiRS97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The phi-quantile of an ordered sequence of data values is the element with rank phi*n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical problems. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) are constant per quantile.

We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

From CS Dept., University Trier (Germany)

URL

Additional information is available from http://www.cise.ufl.edu/~ranka

References

[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ALSS95]
Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. VLDB 1995: 490-501 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ARS97]
...
[AS95]
Rakesh Agrawal, Arun N. Swami: A One-Pass Space-Efficient Algorithm for Finding Quantiles. COMAD 1995: 0- CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[AS96]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[B+72]
Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan: Time Bounds for Selection. J. Comput. Syst. Sci. 7(4): 448-461(1973) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Bat68]
Kenneth E. Batcher: Sorting Networks and Their Applications. AFIPS Spring Joint Computing Conference 1968: 307-314 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Coc77]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[D+91]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FR75]
Robert W. Floyd, Ronald L. Rivest: Expected Time Bounds for Selection. Commun. ACM 18(3): 165-172(1975) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GS90]
...
[JC85]
Raj Jain, Imrich Chlamtac: The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations. Commun. ACM 28(10): 1076-1085(1985) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KGGK94]
Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis: Introduction to Parallel Computing. Benjamin/Cummings 1994, ISBN 0-8053-3170-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Koo80]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LLS+93]
Xiaobo Li, Paul Lu, Jonathan Schaeffer, John Shillington, Pok Sze Wong, Hanmao Shi: On the Versatility of Parallel Sorting by Regular Sampling. Parallel Computing 19(10): 1079-1103(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MD88]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MP80]
J. Ian Munro, Mike Paterson: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12: 315-323(1980) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[PIHS96]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ps84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SD77]
...
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:06 2010 by Michael Ley (ley@uni-trier.de)