Extracting Large-Scale Knowledge Bases from the Web.

Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: Extracting Large-Scale Knowledge Bases from the Web. VLDB 1999: 639-650

@inproceedings{DBLP:conf/vldb/KumarRRT99,
  author    = {Ravi Kumar and
               Prabhakar Raghavan and
               Sridhar Rajagopalan and
               Andrew Tomkins},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Extracting Large-Scale Knowledge Bases from the Web},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {639-650},
  ee        = {db/conf/vldb/KumarRRT99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents

References

[1]: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499
[2]: ...
[3]: ...
[4]: Krishna Bharat, Andrei Z. Broder, Monika Rauch Henzinger, Puneet Kumar, Suresh Venkatasubramanian: The Connectivity Server: Fast Access to Linkage Information on the Web. Computer Networks 30(1-7): 469-477(1998)
[5]: Krishna Bharat, Monika Rauch Henzinger: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. SIGIR 1998: 104-111
[6]: Sergey Brin, Lawrence Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7): 107-117(1998)
[7]: ...
[8]: ...
[9]: Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, Jon M. Kleinberg: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Computer Networks 30(1-7): 65-74(1998)
[10]: ...
[11]: ...
[12]: Jeffrey Dean, Monika Rauch Henzinger: Finding Related Pages in the World Wide Web. Computer Networks 31(11-16): 1467-1479(1999)
[13]: Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59-74(1998)
[14]: ...
[15]: ...
[16]: ...
[17]: ...
[18]: ...
[19]: Jon M. Kleinberg: Authoritative Sources in a Hyperlinked Environment. SODA 1998: 668-677
[20]: ...
[21]: Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: The Web as a Graph: Measurements, Models, and Methods. COCOON 1999: 1-17
[22]: ...
[23]: ...
[24]: ...
[25]: Alberto O. Mendelzon, Peter T. Wood: Finding Regular Simple Paths in Graph Databases. SIAM J. Comput. 24(6): 1235-1258(1995)
[26]: ...
[27]: Ehud Rivlin, Rodrigo A. Botafogo, Ben Shneiderman: Navigating in Hyperspace: Designing a Structure-Based Toolbox. Commun. ACM 37(2): 87-96(1994)
[28]: ...
[29]: Shalom Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon Rosenthal: Query Flocks: A Generalization of Association-Rule Mining. SIGMOD Conference 1998: 1-12
[30]: George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949