ACM SIGMOD Anthology VLDB dblp.uni-trier.de

OODB Bulk Loading Revisited: The Partitioned-List Approach.

Janet L. Wiener, Jeffrey F. Naughton: OODB Bulk Loading Revisited: The Partitioned-List Approach. VLDB 1995: 30-41
@inproceedings{DBLP:conf/vldb/WienerN95,
  author    = {Janet L. Wiener and
               Jeffrey F. Naughton},
  editor    = {Umeshwar Dayal and
               Peter M. D. Gray and
               Shojiro Nishio},
  title     = {OODB Bulk Loading Revisited: The Partitioned-List Approach},
  booktitle = {VLDB'95, Proceedings of 21th International Conference on Very
               Large Data Bases, September 11-15, 1995, Zurich, Switzerland},
  publisher = {Morgan Kaufmann},
  year      = {1995},
  isbn      = {1-55860-379-4},
  pages     = {30-41},
  ee        = {db/conf/vldb/WienerN95.html},
  crossref  = {DBLP:conf/vldb/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Object-oriented and object-relational databases (OODB) need to be able to load the vast quantities of data that OODB users bring to them. Loading OODB data is significantly more complicated than loading relational data due to the presence of relationships, or references, in the data; the presence of these relationships means that naive loading algorithms areslow to the point of being unusable. In our previous work, we presented the late-invsort algorithm, which performed significantly better than naive algorithms on all the data sets we tested. Unfortunately, further experimentation with the late-invsort algorithm revealed that for large data sets (ones in which a critical data structure ofthe load algorithm does not fit in memory), the performance of late-invsort rapidly degrades to where it, too, is unusable. In this paper we propose a new algorithm, the partitioned-list algorithm, whose performance almost matches that of late-invsort for smaller data sets but does not degrade for large data sets. We present a performance study of an implementation within the Shore persistentobject repository showing that the partitioned-list algorithm is at least an order of magnitude better than previous algorithms on large data sets. In addition, because loading gigabytes and terabytes of data can take hours, we describe how to checkpoint the partitioned-list algorithm and resumea long-running load after a system crash or other interruption.

Copyright © 1995 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Umeshwar Dayal, Peter M. D. Gray, Shojiro Nishio (Eds.): VLDB'95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland. Morgan Kaufmann 1995, ISBN 1-55860-379-4
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[Cam95]
...
[Cat93]
R. G. G. Cattell: The Object Database Standard: ODMG-93. Morgan Kaufmann 1993, ISBN 1-55860-302-6
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CDF+94]
Michael J. Carey, David J. DeWitt, Michael J. Franklin, Nancy E. Hall, Mark L. McAuliffe, Jeffrey F. Naughton, Daniel T. Schuh, Marvin H. Solomon, C. K. Tan, Odysseas G. Tsatalos, Seth J. White, Michael J. Zwilling: Shoring Up Persistent Applications. SIGMOD Conference 1994: 383-394 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CMR92]
...
[CMR+94]
Judith Bayard Cushing, David Maier, Meenakshi Rao, Don Abel, David Feller, D. Michael DeVaney: Computational Proxies: Modeling Scientific Applications in Object Databases. SSDBM 1994: 196-206 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[CPea93]
...
[Deu90]
O. Deux: The Story of O2. IEEE Trans. Knowl. Data Eng. 2(1): 91-108(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DLP+93]
...
[Kea83]
Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka: Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1(1): 63-74(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kim94]
Won Kim: UniSQL/X Unified Relational and Object-Oriented Database System. SIGMOD Conference 1994: 481 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[LLOW91]
Charles Lamb, Gordon Landis, Jack A. Orenstein, Daniel Weinreb: The ObjectStore Database System. Commun. ACM 34(10): 50-63(1991) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Mai94]
...
[MN92]
C. Mohan, Inderpal Narang: Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates. SIGMOD Conference 1992: 361-370 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MN93]
C. Mohan, Inderpal Narang: An Efficient and Flexible Method for Archiving a Data Base. SIGMOD Conference 1993: 139-146 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Moh93a]
C. Mohan: A Survey of DBMS Research Issues in Supporting Very Large Tables. FODO 1993: 279-300 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Moh93b]
C. Mohan: IBM's Relational DBMS Products: Features and Technologies. SIGMOD Conference 1993: 445-448 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[MS90]
...
[Nel91]
...
[Obj92]
...
[Ont94]
...
[PG88]
Norman W. Paton, Peter M. D. Gray: Identification of Database Objects by Key. OODBS 1988: 280-285 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[RZ89]
...
[Sha86]
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sno89]
...
[Ube94]
Michael Ubell: The Montage Extensible DataBlade Achitecture. SIGMOD Conference 1994: 482 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ver93]
...
[WCK93]
Andrew Witkowski, Felipe Cariño, Pekka Kostamaa: NCR 3700 - The Next-Generation Industrial Database Computer. VLDB 1993: 230-243 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[WN94]
Janet L. Wiener, Jeffrey F. Naughton: Bulk Loading into an OODB: A Performance Study. VLDB 1994: 120-131 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Fri Mar 12 17:22:53 2010 by Michael Ley (ley@uni-trier.de)