ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Data Placement In Bubba.

George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller: Data Placement In Bubba. SIGMOD Conference 1988: 99-108
@inproceedings{DBLP:conf/sigmod/CopelandABK88,
  author    = {George P. Copeland and
               William Alexander and
               Ellen E. Boughter and
               Tom W. Keller},
  editor    = {Haran Boral and
               Per-{\AA}ke Larson},
  title     = {Data Placement In Bubba},
  booktitle = {Proceedings of the 1988 ACM SIGMOD International Conference on
               Management of Data, Chicago, Illinois, June 1-3, 1988},
  publisher = {ACM Press},
  year      = {1988},
  pages     = {99-108},
  ee        = {http://doi.acm.org/10.1145/50202.50213, db/conf/sigmod/CopelandABK88.html},
  crossref  = {DBLP:conf/sigmod/88},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. "Highly-parallel" implies that load balancing is a critical performance issue. "Data-intensive" means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.

In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i. e., spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.

We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

Copyright © 1988 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Haran Boral, Per-Åke Larson (Eds.): Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, June 1-3, 1988. ACM Press 1988 BibTeX , SIGMOD Record 17(2), June 1988
Contents

Online Edition: ACM Digital Library


References

[Ale87]
William Alexander, Tom W. Keller, Ellen E. Boughter: A Workload Characterization Pipeline for Models of Parallel Systems. SIGMETRICS 1987: 186-194 BibTeX
[Ale88]
William Alexander, George P. Copeland: Comparison of Dataflow Control Techniques In Distributed Data-Intensive Systems. SIGMETRICS 1988: 157-166 BibTeX
[AlC88]
William Alexander, George P. Copeland: Process And Dataflow Control In Distributed Data-Intensive Systems. SIGMOD Conference 1988: 90-98 BibTeX
[Ano85]
...
[Att84]
Rony Attar, Philip A. Bernstein, Nathan Goodman: Site Initialization, Recovery, and Backup in a Distributed Database System. IEEE Trans. Software Eng. 10(6): 645-650(1984) BibTeX
[Bat82]
Don S. Batory: Optimal File Designs and Reorganization Points. ACM Trans. Database Syst. 7(1): 60-81(1982) BibTeX
[Bou87]
...
[Bun84]
Richard B. Bunt, Jennifer M. Murphy, Shikharesh Majumdar: A Measure of Program Locality and Its Application. SIGMETRICS 1984: 28-40 BibTeX
[Chu69]
...
[Cve87]
Zarka Cvetanovic: The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems. IEEE Trans. Computers 36(4): 421-432(1987) BibTeX
[Den78]
Peter J. Denning, Jeffrey P. Buzen: The Operational Analysis of Queueing Network Models. ACM Comput. Surv. 10(3): 225-261(1978) BibTeX
[DeW86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[DeW87]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Rajiv Jauhari, M. Muralikrishna, Anoop Sharma: A Single User Evaluation of the Gamma Database Machine. IWDM 1987: 370-386 BibTeX
[Eas74]
Kapali P. Eswaran: Placement of Records in a File and File Allocation in a Computer. IFIP Congress 1974: 304-307 BibTeX
[Flo78]
André Flory, J. Gunther, Jacques Kouloumdjian: Data Base Reorganization by Clustering Methods. Inf. Syst. 3(1): 59-62(1978) BibTeX
[Gra78]
Jim Gray: Notes on Data Base Operating Systems. Advanced Course: Operating Systems 1978: 393-481 BibTeX
[Gra87]
Jim Gray, Gianfranco R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD Conference 1987: 395-398 BibTeX
[Hwa84]
...
[Jak80]
Matti Jakobsson: Reducing block accesses in inverted files by partial clustering. Inf. Syst. 5(1): 1-5(1980) BibTeX
[Kat78]
...
[Laz84]
...
[Liv87]
Miron Livny, Setrag Khoshafian, Haran Boral: Multi-Disk Management Algorithms. SIGMETRICS 1987: 69-77 BibTeX
[Mah76]
Samy A. Mahmoud, J. Spruce Riordon: Optimal Allocation of Resources in Distributed Information Networks. ACM Trans. Database Syst. 1(1): 66-78(1976) BibTeX
[Mar76]
K. Maruyama, S. E. Smith: Optimal Reorganization of Distributed Space Disk Files. Commun. ACM 19(11): 634-642(1976) BibTeX
[Muk87]
Ravi Mukkamala, Steven C. Bruell, Roger K. Shultz: Design of Partially Replicated Distributed Database Systems: An Integrated Methodology. SIGMETRICS 1988: 187-196 BibTeX
[Omi83]
...
[Sam87]
...
[Shn73]
Ben Shneiderman: Optimum Data Base Reorganization Points. Commun. ACM 16(6): 362-365(1973) BibTeX
[Soc79]
Gary H. Sockut, Robert P. Goldberg: Database Reorganization - Principles and Practice. ACM Comput. Surv. 11(4): 371-395(1979) BibTeX
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[Tan87]
Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL. HPTS 1987: 60-104 BibTeX
[Ter85]
...
[Tue78]
William G. Tuel Jr.: Optimum Reorganization Points for Linearly Growing Files. ACM Trans. Database Syst. 3(1): 32-40(1978) BibTeX
[Vrs85]
...
[Yao76]
S. Bing Yao, K. Sundar Das, Toby J. Teorey: A Dynamic Database Reorganization Algorithm. ACM Trans. Database Syst. 1(2): 159-174(1976) BibTeX
[Yu85]
Clement T. Yu, Cheing-Mei Suen, K. Lam, M. K. Siu: Adaptive Record Clustering. ACM Trans. Database Syst. 10(2): 180-204(1985) BibTeX

Referenced by

  1. Gerhard Weikum: Review - Data Placement In Bubba. ACM SIGMOD Digital Review 2: (2000)
  2. Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457
  3. Peter Scheuermann, Gerhard Weikum, Peter Zabback: Data Partitioning and Load Balancing in Parallel Disk Systems. VLDB J. 7(1): 48-66(1998)
  4. Achim Kraiss, Gerhard Weikum: Integrated Document Caching and Prefetching in Storage Hierarchies Based on Markov-Chain Predictions. VLDB J. 7(3): 141-162(1998)
  5. Bongki Moon, Joel H. Saltz: Scalability Analysis of Declustering Methods for Multidimensional Range Queries. IEEE Trans. Knowl. Data Eng. 10(2): 310-327(1998)
  6. Manish Mehta, David J. DeWitt: Data Placement in Shared-Nothing Parallel Database Systems. VLDB J. 6(1): 53-72(1997)
  7. Anant Jhingran, Timothy Malkemus, Sriram Padmanabhan: Query Optimization in DB2 Parallel Edition. IEEE Data Eng. Bull. 20(2): 27-34(1997)
  8. Achim Kraiss, Gerhard Weikum: Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions. VLDB 1997: 246-255
  9. Markus Sinnwell, Gerhard Weikum: A Cost-Model-Based Online Method for Ditributed Caching. ICDE 1997: 532-541
  10. Toshihiro Nemoto, Masaru Kitsuregawa, Mikio Takagi: Analysis of Cassette Migration Activities in Scalable Tape Archiver. DASFAA 1997: 461-470
  11. Javam Machado, Christine Collet: A Parallel Execution Model for Database Transactions. DASFAA 1997: 511-520
  12. Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB J. 5(1): 48-63(1996)
  13. Dennis Shasha: Tuning Databases for High Performance. ACM Comput. Surv. 28(1): 113-115(1996)
  14. Peter Scheuermann, Junho Shim, Radek Vingralek: WATCHMAN : A Data Warehouse Intelligent Cache Manager. VLDB 1996: 51-62
  15. Kurt P. Brown, Michael J. Carey, Miron Livny: Goal-Oriented Buffer Management Revisited. SIGMOD Conference 1996: 353-364
  16. Nick Bassiliades, Ioannis P. Vlahavas: A Non-Uniform Data Fragmentation Strategy for Parallel Main-Menory Database Systems. VLDB 1995: 370-381
  17. Gerhard Weikum: Tutorial on Parallel Database Systems. ICDT 1995: 33-37
  18. Shivakumar Venkataraman, Miron Livny, Jeffrey F. Naughton: The Impact of Data Placement on Memory Management for Multi-Server OODBMS. ICDE 1995: 355-364
  19. Gabriel Matsliach, Oded Shmueli: A Combined Method for Maintaining Large Indices in Multiprocessor Multidisk Environments. IEEE Trans. Knowl. Data Eng. 6(3): 479-496(1994)
  20. Peter Scheuermann, Gerhard Weikum, Peter Zabback: ``Disk Cooling'' in Parallel Disk Systems. IEEE Data Eng. Bull. 17(3): 29-40(1994)
  21. Peter M. Chen, Edward L. Lee, Garth A. Gibson, Randy H. Katz, David A. Patterson: RAID: High-Performance, Reliable Secondary Storage. ACM Comput. Surv. 26(2): 145-185(1994)
  22. Erik G. Hoel, Hanan Samet: Performance of Data-Parallel Spatial Operations. VLDB 1994: 156-167
  23. Hasanat M. Dewan, Salvatore J. Stolfo, Mauricio A. Hernández, Jae-Jun Hwang: Predictive Dynamic Load Balancing of Parallel and Distributed Rule and Query Processing. SIGMOD Conference 1994: 277-288
  24. Shahram Ghandeharizadeh, David Wilhite, Kai-Ming Lin, Xiaoming Zhao: Object Placement in Parallel Object-Oriented Database Systems. ICDE 1994: 253-262
  25. Doron Rotem, Gerhard A. Schloss, Arie Segev: Data Allocation for Multi-Disk Databases. IEEE Trans. Knowl. Data Eng. 5(5): 882-887(1993)
  26. Shahram Ghandeharizadeh, Luis Ramos: Continuous Retrieval of Multimedia Data Using Parallelism. IEEE Trans. Knowl. Data Eng. 5(4): 658-669(1993)
  27. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  28. Erhard Rahm, Robert Marek: Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems. VLDB 1993: 182-193
  29. Kurt P. Brown, Michael J. Carey, Miron Livny: Managing Memory to Meet Multiclass Workload Response Time Goals. VLDB 1993: 328-341
  30. Cyril U. Orji, Jon A. Solworth: Doubly Distorted Mirrors. SIGMOD Conference 1993: 307-316
  31. Patrick Valduriez: Parallel Database Systems: the case for shared-something. ICDE 1993: 460-465
  32. Stan Danforth, Patrick Valduriez: A FAD for Data Intensive Applications. IEEE Trans. Knowl. Data Eng. 4(1): 34-51(1992)
  33. David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992)
  34. Annita N. Wilschut, Jan Flokstra, Peter M. G. Apers: Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB. VLDB 1992: 521-532
  35. Jianzhong Li, Jaideep Srivastava, Doron Rotem: CMD: A Multidimensional Declustering Method for Parallel Data Systems. VLDB 1992: 3-14
  36. Wei Hong: Exploiting Inter-Operation Parallelism in XPRS. SIGMOD Conference 1992: 19-28
  37. Shahram Ghandeharizadeh, David J. DeWitt, Waheed Qureshi: A Performance Analysis of Alternative Multi-Attribute Declustering Strategies. SIGMOD Conference 1992: 29-38
  38. Anupam Bhide, Ambuj Goyal, Hui-I Hsiao, Anant Jhingran: An Efficient Scheme for Providing High Availability. SIGMOD Conference 1992: 236-245
  39. Dennis Shasha, Jason Tsong-Li Wang: Optimizing Equijoin Queries In Distributed Databases Where Relations Are Hash Partitioned. ACM Trans. Database Syst. 16(2): 279-308(1991)
  40. Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548
  41. Shahram Ghandeharizadeh, Luis Ramos, Zubair Asad, Waheed Qureshi: Object Placement in Parallel Hypermedia Systems. VLDB 1991: 243-254
  42. Gerhard Weikum, Peter Zabback, Peter Scheuermann: Dynamic File Allocation in Disk Arrays. SIGMOD Conference 1991: 406-415
  43. Bernhard Seeger, Per-Åke Larson: Multi-Disk B-trees. SIGMOD Conference 1991: 436-445
  44. Tadashi Ohmori, Masaru Kitsuregawa, Hidehiko Tanaka: Scheduling Batch Transactions on Shared-Nothing Parallel Database Machines: Effects of Concurrency and Parallelism. ICDE 1991: 210-219
  45. Marguerite C. Murphy, Ming-Chien Shan: Execution Plan Balancing. ICDE 1991: 698-706
  46. David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
  47. Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez: Prototyping Bubba, A Highly Parallel Database System. IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990)
  48. Donovan A. Schneider, David J. DeWitt: Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. VLDB 1990: 469-480
  49. Kien A. Hua, Chiang Lee: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. VLDB 1990: 493-506
  50. Shahram Ghandeharizadeh, David J. DeWitt: Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines. VLDB 1990: 481-492
  51. Tadashi Ohmori, Masaru Kitsuregawa, Hidehiko Tanaka: Concurrency Control of Bulk Access Transactions on Shared Nothing Parallel Database Machines. ICDE 1990: 476-485
  52. Hui-I Hsiao, David J. DeWitt: Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines. ICDE 1990: 456-465
  53. B. Paul Jenq, Brian C. Twichell, Tom W. Keller: Locking Performance in a Shared-Nothing Parallel Database Machine. IEEE Trans. Knowl. Data Eng. 1(4): 530-543(1989)
  54. George P. Copeland, Tom W. Keller: A Comparison Of High-Availability Media Recovery Techniques. SIGMOD Conference 1989: 98-109
  55. Christos Faloutsos, Dimitris N. Metaxas: Declustering Using Error Correcting Codes. PODS 1989: 253-258
  56. Gerhard Weikum: Set-Oriented Disk Access to Large Complex Objects. ICDE 1989: 426-433
  57. Patrick Valduriez, Scott Danforth, Brian E. Hart, Ted Briggs, Munir Cochinwala: Compiling FAD, a Database Programming Language. DBPL 1989: 375-393
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Wed Jun 4 18:54:44 2008