Extracting Schema from Semistructured Data.
Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani:
Extracting Schema from Semistructured Data.
SIGMOD Conference 1998: 295-306@inproceedings{DBLP:conf/sigmod/NestorovAM98,
author = {Svetlozar Nestorov and
Serge Abiteboul and
Rajeev Motwani},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Extracting Schema from Semistructured Data},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-995-5},
pages = {295-306},
ee = {http://doi.acm.org/10.1145/276304.276331, db/conf/sigmod/NestorovAM98.html},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Semistructured data is characterized by the lack of any fixed and
rigid schema, although typically the data has some implicit
structure. While the lack of fixed schema makes extracting
semistructured data fairly easy and an attractive goal, presenting and
querying such data is greatly impaired. Thus, a critical problem is
the discovery of the structure implicit in semistructured data and,
subsequently, the recasting of the raw data in terms of this
structure. In this paper, we consider a very general form of
semistructured data based on labeled, directed graphs. We show that
such data can be typed using the greatest fixpoint semantics of
monadic datalog programs. We present an algorithm for approximate
typing of semistructured data. We establish that the general problem
of finding an optimal such typing is NP-hard, but present some
heuristics and techniques based on clustering that allow efficient and
near-optimal treatment of the problem. We also present some
preliminary experimental results.
Copyright © 1998 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
Online Version (ACM WWW Account required): Full Text in PDF Format
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Laura M. Haas, Ashutosh Tiwary (Eds.):
SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA.
ACM Press 1998, ISBN 0-89791-995-5 BibTeX
,
SIGMOD Record 27(2),
June 1998
Contents
[Abstract]
[Full Text (Postscript)]
Long Version
http://www-db.stanford.edu/pub/papers/extract-schema.ps
References
- [1]
- Serge Abiteboul:
Querying Semi-Structured Data.
ICDT 1997: 1-18 BibTeX
- [2]
- Serge Abiteboul, Richard Hull, Victor Vianu:
Foundations of Databases.
Addison-Wesley 1995, ISBN 0-201-53771-0
Contents BibTeX
- [3]
- Antonio Albano, Roberto Bergamini, Giorgio Ghelli, Renzo Orsini:
An Object Data Model with Roles.
VLDB 1993: 39-51 BibTeX
- [4]
- ...
- [5]
- ...
- [6]
- Peter Buneman:
Semistructured Data.
PODS 1997: 117-121 BibTeX
- [7]
- Peter Buneman, Susan B. Davidson, Mary F. Fernandez, Dan Suciu:
Adding Structure to Unstructured Data.
ICDT 1997: 336-350 BibTeX
- [8]
- Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu:
A Query Language and Optimization Techniques for Unstructured Data.
SIGMOD Conference 1996: 505-516 BibTeX
- [9]
- R. G. G. Cattell:
The Object Database Standard: ODMG-93 (Release 1.1).
Morgan Kaufmann 1994
BibTeX
- [10]
- Roy Goldman, Jennifer Widom:
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases.
VLDB 1997: 436-445 BibTeX
- [11]
- ...
- [12]
- ...
- [13]
- ...
- [14]
- ...
- [15]
- Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener, Sudarshan S. Chawathe:
Representative Objects: Concise Representations of Semistructured, Hierarchial Data.
ICDE 1997: 79-90 BibTeX
- [16]
- Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom:
Querying Semistructured Heterogeneous Information.
DOOD 1995: 319-344 BibTeX
- [17]
- Dan Suciu:
Query Decomposition and View Maintenance for Query Languages for Unstructured Data.
VLDB 1996: 227-238 BibTeX
- [18]
- Jeffrey D. Ullman:
Principles of Database and Knowledge-Base Systems, Volume I.
Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
- [18-2]
- Jeffrey D. Ullman:
Principles of Database and Knowledge-Base Systems, Volume II.
Computer Science Press 1989, ISBN 0-7167-8162-X
Contents BibTeX
- [19]
- Moshé M. Zloof:
Query-by-Example: A Data Base Language.
IBM Systems Journal 16(4): 324-343(1977) BibTeX
Referenced by
- Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim:
XTRACT: A System for Extracting Document Type Descriptors from XML Documents.
SIGMOD Conference 2000: 165-176
- Yannis Papakonstantinou, Victor Vianu:
DTD Inference for Views of XML Data.
PODS 2000: 35-46
- Qiu Yue Wang, Jeffrey Xu Yu, Kam-Fai Wong:
Approximate Graph Schema Extraction for Semi-Structured Data.
EDBT 2000: 302-316
- Sihem Amer-Yahia, H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava:
On Bounding-Schemas for LDAP Directories.
EDBT 2000: 287-301
- Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim:
Data Mining and the Web: Past, Present and Future.
Workshop on Web Information and Data Management 1999: 43-47
- Alin Deutsch, Mary F. Fernández, Dan Suciu:
Storing Semistructured Data with STORED.
SIGMOD Conference 1999: 431-442
- Yaron Kanza, Werner Nutt, Yehoshua Sagiv:
Queries with Incomplete Answers over Semistructured Data.
PODS 1999: 227-236
- Stéphane Grumbach, Giansalvatore Mecca:
In Search of the Lost Schema.
ICDT 1999: 314-331
- Silvana Castano, Valeria De Antonellis:
Building Views over Semistructured Data Sources.
ER 1999: 146-160
- Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon:
Database Techniques for the World-Wide Web: A Survey.
SIGMOD Record 27(3): 59-74(1998)
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Wed Jun 4 18:55:29 2008