Sanket Atal: The XPS Approach to Loading and Unloading Terabyte Databases. VLDB 1996: 589
@inproceedings{DBLP:conf/vldb/Atal96,
author    = {Sanket Atal},
editor    = {T. M. Vijayaraman and
Alejandro P. Buchmann and
C. Mohan and
Nandlal L. Sarda},
booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
publisher = {Morgan Kaufmann},
year      = {1996},
isbn      = {1-55860-382-4},
pages     = {589},
ee        = {db/conf/vldb/Atal96.html},
crossref  = {DBLP:conf/vldb/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


## Abstract

XPS (eXtended Parallel Server), Informix's MPP solution, is designed to provide a solution to enterprise-wide database management, which not only includes the DBMS, but also scalable utilities. The focus of this talk will be our load/unload utility which is fast, flexible, scalable, and easy to use.

## Architecture

The loader was developed on top of the existing Parallel Data Query (PDQ) iterator infrastructure. The loader and converter are iterators. The server treats the load iterator tree just like any other iterator tree and is able to use existing algorithms for parallelization and resource allocation. This also allows the loader functionality to have low level access to SQL functions.

The XPS loader design introduces the concept of an external table -- a table that has a catalog entry in a database but does not reside in a materialized form within that database. Any source for a load or target for an unload can be treated as an {\em external table.} That is, an external table can be used as an interface to an application program or system device that is external to the server. To create an external table, one uses our extended create table or select...into... statement syntax.

## Easy to Use

Although there is a GUI interface, its use is optional because SQL can be used to perform load, unload and other operations on the external tables.
• For loading, one can create a catalog entry for an external table by using the create table statement. Then a set insert statement can be used to load the data from this table. The insert statement can contain complex filters on the columns of the external table, thus allowing filtering/scrubbing of the incoming data.
• For unloading, one can either create an external table using the create table statement. Then an insert statement can be used to unload into this table. One can also use the Informix select...into... statement to automatically create an external table definition for such a table.
• External tables can be used in queries so one can analyze the data that is to be loaded.

## Flexible

External tables provide flexibility in load processing:
• They support a variety of different formats (delimited, fixed, ascii, Informix row, etc.)
• They support a variety of input devices either directly or by the use of named pipes. An external table can also consolidate multiple input sources in parallel.
• Data conversion can be parallelized independent of the layout of the source. This means that incoming data does not need to be manually split prior to loading to achieve parallel conversions.
• External table data can be manipulated in several ways:
1. operations (aggregation, trimming, etc.) on columns can be performed.
2. certain types of data scrubbing can be done in the server, in parallel by the loader.
3. columns may be omitted, duplicated, remapped, have their types converted, etc.

## Benefits

• Data that resides in application, devices or files external to the server can be read from/written to in parallel with the full power and generality of SQL functionality.
• Application level data can be loaded in bulk via set-oriented SQL statements instead of using slow, per-tuple cursor mechanisms.
• Loads are parallelized across all available nodes resulting in a very fast and scalable loader. Load performance numbers collected so far have been very encouraging. 2+ GB/hr on an SP2 Thin 1 node (with a 66.7 mhz processor with a specint rating of 114.3) and 100 GB/hr on 48 such nodes.
• External tables provide a good infrastructure for future development towards more abstract types where the input methods can be invoked in the server.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

## ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

## Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents

## Other Formats

Copyright © Tue Mar 16 02:22:06 2010 by Michael Ley (ley@uni-trier.de)