ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Quickly Generating Billion-Record Synthetic Databases.

Jim Gray, Prakash Sundaresan, Susanne Englert, Kenneth Baclawski, Peter J. Weinberger: Quickly Generating Billion-Record Synthetic Databases. SIGMOD Conference 1994: 243-252
@inproceedings{DBLP:conf/sigmod/GraySEBW94,
  author    = {Jim Gray and
               Prakash Sundaresan and
               Susanne Englert and
               Kenneth Baclawski and
               Peter J. Weinberger},
  editor    = {Richard T. Snodgrass and
               Marianne Winslett},
  title     = {Quickly Generating Billion-Record Synthetic Databases},
  booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on
               Management of Data, Minneapolis, Minnesota, May 24-27, 1994},
  publisher = {ACM Press},
  year      = {1994},
  pages     = {243-252},
  ee        = {http://doi.acm.org/10.1145/191839.191886, db/conf/sigmod/GraySEBW94.html},
  crossref  = {DBLP:conf/sigmod/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Evaluating database system performance often requires generating synthetic databases - ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions. The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors,with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Richard T. Snodgrass, Marianne Winslett (Eds.): Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994. ACM Press 1994 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 23(2), June 1994
Contents

Online Edition: ACM Digital Library

[Abstract and Index Terms]
[Full Text in PDF Format, 1086 KB]

References

[Bitton 1]
...
[Bitton 2]
Dina Bitton, David J. DeWitt, Carolyn Turbyfill: Benchmarking Database Systems A Systematic Approach. VLDB 1983: 8-19 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Coppersmith]
Don Coppersmith, Andrew M. Odlyzko, Richard Schroeppel: Discrete Logarithms in GF(p). Algorithmica 1(1): 1-15(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeWitt 1]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeWitt 2]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[DeWitt 3]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Englert]
Susanne Englert, Jim Gray, Terrye Kocher, Praful Shah: A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases. SIGMETRICS 1990: 245-246 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Gerber]
...
[Hobbs]
...
[Horst]
...
[Jain]
...
[Kim]
Michelle Y. Kim: Synchronized Disk Interleaving. IEEE Trans. Computers 35(11): 978-988(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Knuth]
Donald E. Knuth: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd Edition. Addison-Wesley 1981, ISBN 0-201-03822-6
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Kronenberg]
...
[Nyberg]
Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, David B. Lomet: AlphaSort: A RISC Machine Sort. SIGMOD Conference 1994: 233-242 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Press]
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Ripley]
...
[Schrage]
...
[Smith]
Marc G. Smith, William Alexander, Haran Boral, George P. Copeland, Tom W. Keller, Herbert D. Schwetman, Chii-Ren Young: An Experiment on Response Time Scalability in Bubba. IWDM 1989: 34-57 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Stonebraker]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[Tanenbaum]
...
[Teradata]
...
[Thekkath]
Chandramohan A. Thekkath, Henry M. Levy: Limits to Low-Latency Communication on High-Speed Networks. ACM Trans. Comput. Syst. 11(2): 179-203(1993) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[TPC]
...
[Uren]
...

Copyright © Fri Mar 12 17:21:31 2010 by Michael Ley (ley@uni-trier.de)