ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Incremental Clustering for Mining in a Data Warehousing Environment.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, Xiaowei Xu: Incremental Clustering for Mining in a Data Warehousing Environment. VLDB 1998: 323-333
@inproceedings{DBLP:conf/vldb/EsterKSWX98,
  author    = {Martin Ester and
               Hans-Peter Kriegel and
               J{\"o}rg Sander and
               Michael Wimmer and
               Xiaowei Xu},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Incremental Clustering for Mining in a Data Warehousing Environment},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {323-333},
  ee        = {db/conf/vldb/EsterKSWX98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Due to the density-based nature of DBSCAN, the insertion or deletion of anobject affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of Incremental DBSCAN on a spatial database as well as on a WWW-log database is presented, demonstrating the efficiency of the proposed algorithm. Incremental DBSCAN yields significant speed-up factors over DBSCAN even for large numbers of daily updates in a data warehouse.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[AF 96]
...
[AS 94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BKSS 90]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Bou 96]
Athman Bouguettaya: On-Line Clustering. IEEE Trans. Knowl. Data Eng. 8(2): 333-339(1996) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CHNW 96]
David Wai-Lok Cheung, Jiawei Han, Vincent T. Y. Ng, C. Y. Wong: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996: 106-114 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CPZ 97]
Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[EKSX 96]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[EKX 95]
Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. SSD 1995: 67-82 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[EW 98]
Martin Ester, Rüdiger Wittmann: Incremental Generalization for Mining in a Data Warehousing Environment. EDBT 1998: 135-149 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FAAM 97]
Ronen Feldman, Yonatan Aumann, Amihood Amir, Heikki Mannila: Efficient Algorithms for Discovering Frequent Sets in Incremental Databases. DMKD 1997: 0- CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[FPS 96]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth: Knowledge Discovery and Data Mining: Towards a Unifying Framework. KDD 1996: 82-88 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Gue 94]
Ralf Hartmut Güting: An Introduction to Spatial Database Systems. VLDB J. 3(4): 357-399(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HCC93]
Jiawei Han, Yandong Cai, Nick Cercone: Data-Driven Discovery of Quantitative Rules in Relational Databases. IEEE Trans. Knowl. Data Eng. 5(1): 29-40(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Huy 97]
Nam Huyn: Multiple-View Self-Maintenance in Data Warehousing Environments. VLDB 1997: 26-35 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[KR 90]
L. Kaufman, P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley 1990
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Luo 95]
...
[MJHS 96]
...
[MQM 97]
Inderpal Singh Mumick, Dallan Quass, Barinderpal Singh Mumick: Maintenance of Data Cubes and Summary Tables in a Warehouse. SIGMOD Conference 1997: 100-111 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[NH 94]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SEKX 98]
...
[Sib 73]
R. Sibson: SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method. Comput. J. 16(1): 30-34(1973) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ZRL 96]
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:07 2010 by Michael Ley (ley@uni-trier.de)