Using Referential Integrity To Easily Define Consistent Subset Replicas.

Brad Hammond: Using Referential Integrity To Easily Define Consistent Subset Replicas. VLDB 1996: 591

@inproceedings{DBLP:conf/vldb/Hammond96,
  author    = {Brad Hammond},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Using Referential Integrity To Easily Define Consistent Subset
               Replicas},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {591},
  ee        = {db/conf/vldb/Hammond96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Microsoft Access 7.0 has a powerful ``update anywhere'' replication feature. It allows replicas to be refreshed on demand and at scheduled times, and replicas can be disconnected for long periods of time. One of its limitations is that all replicas must contain the complete set of replicated data. Users want to create ``partial replicas'' that contain only a subset of the data, so that they can have smaller replicas for smaller machines, and so that different customers can have replicas which exclude the data for other customers. The Office 97 version of Access will have a ``partial replica'' feature that creates and maintains replicas which contain only a subset of the rows. In designing this feature, several alternatives were considered for defining the replicated view:

Allow only single table queries to define a subset of a table
Allow a combination of single table queries, and "relationship filters" which allow joins on foreign keys to define which rows belong in the subset
Allow any view definition, including joins and subqueries, to define the replicated view

Key criteria in determining our choice were efficiencies of incremental replication and how easily users could create useful partial replicas.

I. Allow Only Single Variable Queries

This alternative allows efficient incremental replication, since only rows that have been recently updated/inserted need to be sent to the partial replica, and it can be determined whether a row belongs at a partial replica simply by evaluating the filter predicate. However, since the filter predicate can only refer to columns in the table being filtered, this restriction often requires denormalizing of databases. For example, consider a simple database that has Customer, Order, OrderDetail, and Product tables. In order to create a partial replica containing the Washington customers, it would be necessary to add Customer_state as a column to both the Order and OrderDetail tables. This alternative was rejected as being overly restrictive.

II. Single Variable Queries And Relationship Filters

This alternative allows single-variable queries on any table. For tables which contain foreign keys, it also allows the automatic selection of all rows containing foreign keys which correspond to primary keys that belong at the partial replica. The partial replicas enforce the same referential integrity constraints as full replicas, so if there is a relationship for Table A (fkey) that references Table B (unique key), and rows from Table A are in the partial replica, then the rows in Table B with the corresponding unique keys are automatically included in the partial replica. However, the converse is not true - the partial replica can contain rows that have a unique key, yet exclude the rows with the foreign keys that reference them. If the rows with foreign keys are to be included at the partial replica, a relationship filter is added for the referencing table.

For example: To create a partial replica consisting of the Washington customers and their related information, the partial replica definition would have a simple predicate `` State = `WA' '' on the Customer table, and relationship filters on the Order and OrderDetail tables. There is an Access Wizard that automatically generates these filters, which is easy to use but restricts one's choices. The relationship filters and boolean filters can also be added programmatically using Visual Basic.

This alternative eliminates the need to denormalize data, yet there is still enough restriction on the partial replica definition to allow fairly efficient incremental replication.

III. Allow Any Predicate

Allowing any predicate, including joins and nested subqueries, eliminates the need to denormalize the database. However, it makes incremental replication difficult or impossible. The reason is that any data change may affect a subquery's result, and thus cause rows which have not changed recently to now meet the filter criteria. For example, suppose a partial replica is supposed to contain the customers (and related orders, etc.) whose average order is more than $1000. A customer could become a member of the replicated view as the result of an insert or update to an Order Detail row, or the deletion of an Order row. It is easy to think of predicates where the database system would have to examine virtually the entire database to see what belongs at the partial replica after a data change. While this would not be the case for all sets of filter predicates, it would be a very difficult problem to always propagate changes to the partial replica in the most efficient way.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/vldb8997 or /cdrom

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents

Other Formats

From CS Dept., University Trier (Germany)