ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.

Christos Faloutsos, King-Ip Lin: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Conference 1995: 163-174
@inproceedings{DBLP:conf/sigmod/FaloutsosL95,
  author    = {Christos Faloutsos and
               King-Ip Lin},
  editor    = {Michael J. Carey and
               Donovan A. Schneider},
  title     = {FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization
               of Traditional and Multimedia Datasets},
  booktitle = {Proceedings of the 1995 ACM SIGMOD International Conference on
               Management of Data, San Jose, California, May 22-25, 1995},
  publisher = {ACM Press},
  year      = {1995},
  pages     = {163-174},
  ee        = {http://doi.acm.org/10.1145/223784.223812, db/conf/sigmod/sigmod95-12.html},
  crossref  = {DBLP:conf/sigmod/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jagadish, SIGMOD Conf. 1991: 208-217]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [Brinkhoff, SIGMOD Conf. 1994: 197-208]); the nearest-neighbor or best-match query, etc.

However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.

This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.

We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS); although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Michael J. Carey, Donovan A. Schneider (Eds.): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May 22-25, 1995. ACM Press 1995 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 24(2), June 1995
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1179 KB]

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
...
[5]
Manish Arya, William F. Cody, Christos Faloutsos, Joel E. Richardson, Arthur Toya: QBISM: A Prototype 3-D Medical Image Database System. IEEE Data Eng. Bull. 16(1): 38-42(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, Sun Wu: Proximity Matching Using Fixed-Queries Trees. CPM 1994: 198-212 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: Multi-Step Processing of Spatial Joins. SIGMOD Conference 1994: 197-208 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger: Efficient Processing of Spatial Joins Using R-Trees. SIGMOD Conference 1993: 237-246 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
Walter A. Burkhard, Robert M. Keller: Some Approaches to Best-Match File Searching. Commun. ACM 16(4): 230-236(1973) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
...
[13]
Susan T. Dumais: Latent Semantic Indexing (LSI): TREC-3 Report. TREC 1994: 105-115 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
Christos Faloutsos, Shari Roseman: Fractals for Secondary Key Retrieval. PODS 1989: 247-252 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[15]
...
[16]
Peter W. Foltz, Susan T. Dumais: Personalized Information Delivery: An Analysis of Information Filtering Methods. Commun. ACM 35(12): 51-60(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
...
[18]
Irene Gargantini: An Effective Way to Represent Quadtrees. Commun. ACM 25(12): 905-910(1982) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
...
[20]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[21]
...
[22]
...
[23]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
H. V. Jagadish: Spatial Search with Polyhedra. ICDE 1990: 311-319 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[25]
H. V. Jagadish: A Retrieval Technique for Similar Shapes. SIGMOD Conference 1991: 208-217 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
Ibrahim Kamel, Christos Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. VLDB 1994: 500-509 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[28]
...
[29]
...
[30]
Karen Kukich: Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 24(4): 377-439(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[31]
David B. Lomet, Betty Salzberg: The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. 15(4): 625-658(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[32]
Fionn Murtagh: A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J. 26(4): 354-359(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[33]
A. Desai Narasimhalu, Stavros Christodoulakis: Multimedia Information Systems: The Unfolding of a Reality (Guest Editors' Introduction). IEEE Computer 24(10): 6-8(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[34]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[35]
Wayne Niblack, Ron Barber, William Equitz, Myron Flickner, Eduardo H. Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, Gabriel Taubin: The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape. Storage and Retrieval for Image and Video Databases (SPIE) 1993: 173-187 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[36]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[37]
Jack A. Orenstein: Spatial Query Processing in an Object-Oriented Database System. SIGMOD Conference 1986: 326-336 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[38]
Jack A. Orenstein: A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces. SIGMOD Conference 1990: 343-352 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[39]
...
[40]
...
[41]
...
[42]
Nick Roussopoulos, Stephen Kelley, Frédéic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[43]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[44]
...
[45]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[46]
Marvin B. Shapiro: The Choice of Reference Points in Best-Match File Searching. Commun. ACM 20(5): 339-343(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[47]
Dennis Shasha, Jason Tsong-Li Wang: New Techniques for Best-Match Retrieval. ACM Trans. Inf. Syst. 8(2): 140-158(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[48]
...
[49]
...
[50]
...
[51]
...
[52]
...
[53]
...
[54]
...
[55]
...

Copyright © Fri Mar 12 17:21:32 2010 by Michael Ley (ley@uni-trier.de)