ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Duplicate Removal in Information System Dissemination.

Tak W. Yan, Hector Garcia-Molina: Duplicate Removal in Information System Dissemination. VLDB 1995: 66-77
@inproceedings{DBLP:conf/vldb/YanG95,
  author    = {Tak W. Yan and
               Hector Garcia-Molina},
  editor    = {Umeshwar Dayal and
               Peter M. D. Gray and
               Shojiro Nishio},
  title     = {Duplicate Removal in Information System Dissemination},
  booktitle = {VLDB'95, Proceedings of 21th International Conference on Very
               Large Data Bases, September 11-15, 1995, Zurich, Switzerland},
  publisher = {Morgan Kaufmann},
  year      = {1995},
  isbn      = {1-55860-379-4},
  pages     = {66-77},
  ee        = {db/conf/vldb/YanG95.html},
  crossref  = {DBLP:conf/vldb/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Our experience with the SIFT [YGM95] information dissemination system (in use by over 7,000 users daily) has identified an important and generic disseminationproblem: duplicate information. In this paper we explain why duplicates arise, we quantify the problem, and we discuss why it impairs information dissemination. We then propose a Duplicate Removal Module (DRM) for an information dissemination system. The removal of duplicates operates on a per user, per document basis - each document read by a user generates a request, or a duplicate restraint. In wide-area environments, the number of restraints handled is very large. We consider the implementation of a DRM, examining alternative algorithms and data structures that may be used. We present a performance evaluation of the alternatives and answer important design questions such as: Which implementation is the best? With "best" scheme, how expensive will duplicate removal be? How much memory is required? How fast can restraints be processed?

Copyright © 1995 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio (Eds.): VLDB'95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland. Morgan Kaufmann 1995, ISBN 1-55860-379-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[BDGM95]
Sergey Brin, James Davis, Hector Garcia-Molina: Copy Detection Mechanisms for Digital Documents. SIGMOD Conference 1995: 398-409 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[BLCGP92]
Tim Berners-Lee, Robert Cailliau, Jean-François Groff, Bernd Pollermann: World-Wide Web: The Information Universe. Electronic Networking: Research, Applications and Policy 1(2): 74-82(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Coh92]
...
[Goy87]
Pankaj Goyal: Duplicate record identification in bibliographic databases. Inf. Syst. 12(3): 239-242(1987) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[HR79]
...
[Kro92]
...
[LT92]
Shoshana Loeb, Douglas B. Terry: Information Filtering - Preface to the Secial Section. Commun. ACM 35(12): 26-28(1992) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[ORO93]
...
[Rei93]
...
[Rid92]
...
[Sal68]
...
[SGM95]
Narayanan Shivakumar, Hector Garcia-Molina: SCAM: A Copy Detection Mechanism for Digital Documents. DL 1995: 0- CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YGM94a]
Tak W. Yan, Hector Garcia-Molina: Index Structures for Information Filtering Under the Vector Space Model. ICDE 1994: 337-347 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YGM94b]
Tak W. Yan, Hector Garcia-Molina: Index Structures for Selective Dissemination of Information Under the Boolean Model. ACM Trans. Database Syst. 19(2): 332-364(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YGM95]
Tak W. Yan, Hector Garcia-Molina: SIFT - a Tool for Wide-Area Information Dissemination. USENIX Winter 1995: 177-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Mon Mar 15 03:55:55 2010 by Michael Ley (ley@uni-trier.de)