ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).

Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221
@inproceedings{DBLP:conf/vldb/KitsuregawaO90,
  author    = {Masaru Kitsuregawa and
               Yasushi Ogawa},
  editor    = {Dennis McLeod and
               Ron Sacks-Davis and
               Hans-J{\"o}rg Schek},
  title     = {Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash
               Join Method for Data Skew in the Super Database Computer (SDC)},
  booktitle = {16th International Conference on Very Large Data Bases, August
               13-16, 1990, Brisbane, Queensland, Australia, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1990},
  isbn      = {1-55860-149-X},
  pages     = {210-221},
  ee        = {db/conf/vldb/KitsuregawaO90.html},
  crossref  = {DBLP:conf/vldb/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

The Super Database Computer (SDC) is a high- performance relational database server for a join- intensive environment under development at Universityof Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelized, and has been employed by many database machines. However, in the presence of data skew, it's hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelizing strategy. Thus, performance is severly degraded.

In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same size and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected inthe corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability.

Copyright © 1990 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Dennis McLeod, Ron Sacks-Davis, Hans-Jörg Schek (Eds.): 16th International Conference on Very Large Data Bases, August 13-16, 1990, Brisbane, Queensland, Australia, Proceedings. Morgan Kaufmann 1990, ISBN 1-55860-149-X
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

References

[Bra84]
Kjell Bratbergsengen: Hashing Methods and Relational Algebra Operations. VLDB 1984: 323-333 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Bra89]
Kjell Bratbergsengen, Torgrim Gjelsvik: The Development of the CROSS8 and HC16-186 Parallel (Database) Computers. IWDM 1989: 359-372 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeW84]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeW85]
David J. DeWitt, Robert H. Gerber: Multiprocessor Hash-Based Join Algorithms. VLDB 1985: 151-164 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeW86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ger86]
...
[Hir90]
...
[Kit83]
Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka: Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1(1): 63-74(1983) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kit84]
...
[Kit87]
Masaru Kitsuregawa, Miyuki Nakano, Lilian Harada, Mikio Takagi: Functional Disk System for Relational Database. ICDE 1987: 88-95 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kit87b]
Masaru Kitsuregawa, Weikang Yang, T. Suzuki, Mikio Takagi: Design and Implementation of High Speed Pipeline Merge Sorter with Run Length Tuning Mechanism. IWDM 1987: 89-102 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kit89]
Masaru Kitsuregawa, Miyuki Nakano, Mikio Takagi: Query Execution for Large Relations on Functional Disk Systems. ICDE 1989: 159-167 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kit89b]
...
[Kit89c]
Masaru Kitsuregawa, Masaya Nakayama, Mikio Takagi: The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method. VLDB 1989: 257-266 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Lak88]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Lak89]
M. Seetha Lakshmi, Philip S. Yu: Limiting Factors of Join Performance on Parallel Processors. ICDE 1989: 488-496 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Law75]
...
[Mit89]
...
[Nak88]
Masaya Nakayama, Masaru Kitsuregawa, Mikio Takagi: Hash-Partitioned Join Method Using Dynamic Destaging Strategy. VLDB 1988: 468-478 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Nec88]
...
[Omi89]
Edward Omiecinski, Eileen Tien Lin: Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers. IEEE Trans. Knowl. Data Eng. 1(3): 329-343(1989) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Rie78]
...
[Sch89]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sha86]
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ter88]
...
[Tan88]
The Tandem Performance Group: A Benchmark of NonStop SQL on the Debit Credit Transaction (Invited Paper). SIGMOD Conference 1988: 337-341 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Tur87]
...
[Val84]
Patrick Valduriez, Georges Gardarin: Join and Semijoin Algorithms for a Multiprocessor Database Machine. ACM Trans. Database Syst. 9(1): 133-161(1984) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:00 2010 by Michael Ley (ley@uni-trier.de)