ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications.

John C. Shafer, Rakesh Agrawal: Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications. VLDB 1997: 176-185
@inproceedings{DBLP:conf/vldb/ShaferA97,
  author    = {John C. Shafer and
               Rakesh Agrawal},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Parallel Algorithms for High-dimensional Similarity Joins for
               Data Mining Applications},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {176-185},
  ee        = {db/conf/vldb/ShaferA97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We consider the problem of parallelizing high-dimensional proximity joins. We present a parallel multidimensional join algorithm based on an epsilon-kdB tree abd compare it with the more common approach of space partitioning. An evaluation of the algorithm on an IBM SP2 shared-nothing multiprocessor is presented using both synthetic and real-life datasets. We also examine the effictiveness of the algorithms in the context of a specific data-mining problem, that of finding similar time-series. The empirical results show that our algorithm exhibits good performance and scalability, as well as ability to handle data-skew.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. VLDB 1995: 490-501 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger: Parallel Processing of Spatial Joins Using R-trees. ICDE 1996: 258-265 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger: Efficient Processing of Spatial Joins Using R-Trees. SIGMOD Conference 1993: 237-246 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Christos Faloutsos: Multiattribute Hashing Using Gray Codes. SIGMOD Conference 1986: 227-238 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos: Fast Subsequence Matching in Time-Series Databases. SIGMOD Conference 1994: 419-429 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
...
[9]
...
[10]
...
[11]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
Nick Koudas, Kenneth C. Sevcik: Size Separation Spatial Join. SIGMOD Conference 1997: 324-335 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Ming-Ling Lo, Chinya V. Ravishankar: Generating Seeded Trees from Data Sets. SSD 1995: 328-347 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Ming-Ling Lo, Chinya V. Ravishankar: Spatial Hash-Joins. SIGMOD Conference 1996: 247-258 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
Jack A. Orenstein, T. H. Merrett: A Class of Data Structures for Associative Searching. PODS 1984: 181-190 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Jignesh M. Patel, David J. DeWitt: Partition Based Spatial-Merge Join. SIGMOD Conference 1996: 259-270 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
...
[20]
Kyuseok Shim, Ramakrishnan Srikant, Rakesh Agrawal: High-Dimensional Similarity Joins. ICDE 1997: 301-311 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:06 2010 by Michael Ley (ley@uni-trier.de)