ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Extracting Large-Scale Knowledge Bases from the Web.

Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: Extracting Large-Scale Knowledge Bases from the Web. VLDB 1999: 639-650
@inproceedings{DBLP:conf/vldb/KumarRRT99,
  author    = {Ravi Kumar and
               Prabhakar Raghavan and
               Sridhar Rajagopalan and
               Andrew Tomkins},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Extracting Large-Scale Knowledge Bases from the Web},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {639-650},
  ee        = {db/conf/vldb/KumarRRT99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
...
[3]
...
[4]
Krishna Bharat, Andrei Z. Broder, Monika Rauch Henzinger, Puneet Kumar, Suresh Venkatasubramanian: The Connectivity Server: Fast Access to Linkage Information on the Web. Computer Networks 30(1-7): 469-477(1998) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
Krishna Bharat, Monika Rauch Henzinger: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. SIGIR 1998: 104-111 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[6]
Sergey Brin, Lawrence Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7): 107-117(1998) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
...
[8]
...
[9]
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, Jon M. Kleinberg: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Computer Networks 30(1-7): 65-74(1998) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
...
[11]
...
[12]
Jeffrey Dean, Monika Rauch Henzinger: Finding Related Pages in the World Wide Web. Computer Networks 31(11-16): 1467-1479(1999) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[13]
Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59-74(1998) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
...
[15]
...
[16]
...
[17]
...
[18]
...
[19]
Jon M. Kleinberg: Authoritative Sources in a Hyperlinked Environment. SODA 1998: 668-677 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
...
[21]
Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: The Web as a Graph: Measurements, Models, and Methods. COCOON 1999: 1-17 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[22]
...
[23]
...
[24]
...
[25]
Alberto O. Mendelzon, Peter T. Wood: Finding Regular Simple Paths in Graph Databases. SIAM J. Comput. 24(6): 1235-1258(1995) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
Ehud Rivlin, Rodrigo A. Botafogo, Ben Shneiderman: Navigating in Hyperspace: Designing a Structure-Based Toolbox. Commun. ACM 37(2): 87-96(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[28]
...
[29]
Shalom Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon Rosenthal: Query Flocks: A Generalization of Association-Rule Mining. SIGMOD Conference 1998: 1-12 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[30]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:08 2010 by Michael Ley (ley@uni-trier.de)