ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Determining Text Databases to Search in the Internet.

Weiyi Meng, King-Lup Liu, Clement T. Yu, Xiaodong Wang, Yuhsi Chang, Naphtali Rishe: Determining Text Databases to Search in the Internet. VLDB 1998: 14-25
@inproceedings{DBLP:conf/vldb/MengLYWCR98,
  author    = {Weiyi Meng and
               King-Lup Liu and
               Clement T. Yu and
               Xiaodong Wang and
               Yuhsi Chang and
               Naphtali Rishe},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Determining Text Databases to Search in the Internet},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {14-25},
  ee        = {db/conf/vldb/MengLYWCR98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accuratelypredict the usefulness of each database, because with such information, weonly need to retrieve potentially useful documents from useful databases. In this paper, we propose two new methods for estimating the usefulness oftext databases. For a given query, the usefulness of a text database in this paper is defined to be the number of documents in the database that aresufficiently similar to the query. Such a usefulness measure enables naive-users to make informed decision about which databases to search. We also consider the collection fusion problem. Because local databases may employsimilarity functions that are different from that used by the global database, the threshold used by a local database to determine whether a document is potentially useful may be different from that used by the global database. We provide techniques that determine the best threshold for a given local database.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[ALSF97]
...
[BuSA93]
...
[CLBC95]
James P. Callan, Zhihong Lu, W. Bruce Croft: Searching Distributed Collections with Inference Networks. SIGIR 1995: 21-28 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DuHa73]
...
[Gass69]
...
[GrGM95a]
Luis Gravano, Hector Garcia-Molina: Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. VLDB 1995: 78-89 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[GrGM95b]
...
[GrGM97]
Luis Gravano, Hector Garcia-Molina: Merging Ranks from Heterogeneous Internet Sources. VLDB 1997: 196-205 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Harm93]
...
[HoDr97]
...
[KaMe91]
...
[Kost94]
Martijn Koster: ALIWEB - Archie-like Indexing in the WEB. Computer Networks and ISDN Systems 27(2): 175-182(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kow97]
...
[LaYu82]
K. Lam, Clement T. Yu: A Clustered Search Algorithm Incorporating Arbitrary Term Dependencies. ACM Trans. Database Syst. 7(3): 500-508(1982) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MaBi97]
...
[MLYW98]
...
[NCS]
...
[SaMc83]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Salt89]
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[SeEt95]
...
[SeEt97]
...
[TVGJ95]
...
[VGJL95]
Ellen M. Voorhees, Narendra Kumar Gupta, Ben Johnson-Laird: Learning Collection Fusion Strategies. SIGIR 1995: 172-179 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Widd89]
...
[YaGM95]
Tak W. Yan, Hector Garcia-Molina: SIFT - a Tool for Wide-Area Information Dissemination. USENIX Winter 1995: 177-186 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YuLS78]
Clement T. Yu, W. S. Luk, M. K. Siu: On the Estimation of the Number of Desired Records with Respect to a Given Query. ACM Trans. Database Syst. 3(1): 41-56(1978) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[YuLe97]
Budi Yuwono, Dik Lun Lee: Server Ranking for Distributed Text Retrieval Systems on the Internet. DASFAA 1997: 41-50 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:07 2010 by Michael Ley (ley@uni-trier.de)