ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.

Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535
@inproceedings{DBLP:conf/vldb/HuaL91,
  author    = {Kien A. Hua and
               Chiang Lee},
  editor    = {Guy M. Lohman and
               Am\'{\i}lcar Sernadas and
               Rafael Camps},
  title     = {Handling Data Skew in Multiprocessor Database Computers Using
               Partition Tuning},
  booktitle = {17th International Conference on Very Large Data Bases, September
               3-6, 1991, Barcelona, Catalonia, Spain, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1991},
  isbn      = {1-55860-150-3},
  pages     = {525-535},
  ee        = {db/conf/vldb/HuaL91.html},
  crossref  = {DBLP:conf/vldb/91},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mechanism,skew effect can deteriorate the system performance severely.

In this paper, we propose two skew avoidance techniques and one skew resolution method. In particular, three new parallel hash join algorithms are presented. We developed an analytical model to study the effectiveness of these algorithms. The performance study indicates that the proposed techniques offer substantial improvement over the conventional strategies in the presence of data skew. It is also interesting to observe that the skew avoidance techniques provide join strategies that are robust against data skew; whereas the skew resolution method offers an adaptive join strategy that outperforms the conventional algorithms for any skew condition.

Copyright © 1991 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Guy M. Lohman, Amílcar Sernadas, Rafael Camps (Eds.): 17th International Conference on Very Large Data Bases, September 3-6, 1991, Barcelona, Catalonia, Spain, Proceedings. Morgan Kaufmann 1991, ISBN 1-55860-150-3
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[1]
...
[2]
Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez: Prototyping Bubba, A Highly Parallel Database System. IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
Susanne Englert, Jim Gray, Terrye Kocher, Praful Shah: A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases. SIGMETRICS 1990: 245-246 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[5]
...
[6]
...
[7]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
Kien A. Hua, Chiang Lee: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. VLDB 1990: 493-506 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[14]
...
[15]
M. Seetha Lakshmi, Philip S. Yu: Limiting Factors of Join Performance on Parallel Processors. ICDE 1989: 488-496 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
...
[17]
...
[18]
...
[19]
...
[20]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:02 2010 by Michael Ley (ley@uni-trier.de)