20. VLDB 1994: Santiago de Chile, Chile - Tutorials

1. Geographic Information Systems

Geographic Information Systems -GIS- are systems that manage a special class of data, georeferenced data. This term refers to data that deal with geographic phenomena associated with their location, spatially referenced to the Earth. GIS support a wide range of application domains, such as urban planning, natural resources administration, agriculture, public utility network management, route optimization, demography, cartography, coastal monitoring, fire and epidemics control. In most domains, GIS play a major role as a decision support tool for planning activities. Also, GIS present a challenge to database researchers. Data to be integrated into GIS come in distinct formats, as well as from different sources and geographic locations, being captured in varying periods of time by several types of devices. Their processing involves considerable amounts of space and requires specialized operations, not available in commercial database systems. In order to efficiently support GIS applications, database systems must be built to provide users with new storage, management and presentation facilities.

The purpose of the tutorial is to review the state-of-the-art in database support for GIS and outline some of the research issues currently being addressed in the development of these systems, both from end-users and from database designers' standpoints. The content of the tutorial includes data models, I/O processing, data management and retrieval, and data storage and spatial access methods. The tutorial will also discuss various approaches to developing applications for GIS, analyzing current tendencies and some open issues.

Presentation: The course will be presented in Spanish. The transparencies will be in English, which will help the simultaneous translation process.

Instructor: Claudia Bauzer Medeiros, UNICAMP, Brazil

Dr. Medeiros is senior assistant professor of CS at the Universidade Estadual de Campinas (UNICAMP), Brazil. She is currently the principal investigator of a research project on developing GIS for environmental control applications for an object oriented platform. Dr. Medeiros took her electronic engineering degree in 1976 and her MSc degree in Informatics in 1979 from the Pontificia Universidade Catolica, PUC/RJ, Brazil. Her PhD degree was obtained from the University of Waterloo, Canada, in 1985 and her Livre Docencia (in databases) from UNICAMP in 1992. She has held visiting appointments at INRIA (Rocquencourt), France, and at the Universite Paris-Dauphine, Paris, France. She is an author or co-author of about 30 papers on databases and software engineering methodologies. Presently she is the Editor of the Journal of the Brazilian CS Society.

2. Persistent Programming Systems: The Future of Databases?

The successful uses of databases has been based on the notion that their is a strict methodology for their construction. Firstly, the form of the data (the schema) is defined, then the database is populated (with values) and then programs (queries) are written to access and manipulate the data. The combination of modern applications together with the need to store highly structured and complex data in the database questions the wisdom of constructing database systems in such an inflexible manner. The longevity of the data means that the accretion of meta-data, data and programs is almost uniform and that mechanisms are required to consistently control their evolution. The notion that the meta-data is more fixed than the data, which in turn is more fixed than the queries is being challenged. The complexity of the data also means that abstraction mechanisms are required to control the modelling and uses of all the information in the database. Persistent programming research has for some time concentrated in integrating the concepts of both programming languages and databases. This tutorial will review the state of persistent programming systems in relation to the manner in which they control the complexity of building long-lived, data-intensive application systems taking the approach that meta-data, data and programs have equal status. Two principles must be combined to control complexity: uniformity and incrementality. For applications to attain significant longevity they must avoid ossification. Incremental evolution and accretion of meta-data, data and program must be an integral part of the system design and operating specification. There is always a number of trade-offs to be made between the safety and performance of early binding and the flexibility of dynamic binding. The economics of change dominate the design of these modern application systems. The tutorial will review the approaches to uniformity and incrementality available to persistent application system designers. A liberal use of examples will be used to illustrate the concepts. It will also include a new technique for of programming called hyper-programming which is only possible in integrated persistent systems.

Instructor: Malcolm P. Atkinson, U. of Glasgow, UK

Malcolm Atkinson obtained his first degree from the University of Cambridge in 1966, followed by a Diploma in CS in 1967. After three years research and teaching at Lancaster University he returned to Cambridge and was awarded his PhD in 1974. He then held academic posts in Burma, Cambridge, East Anglia and Edinburgh, being appointed to a senior lectureship at Edinburgh in 1983. He was a visiting professor at the Univ. of Pennsylvania during 1983-84 and was appointed to a professorship in CS at Glasgow Univ. in 1984. He was head of Dept. of Computing Science from 1986 to 1990, following which he spent nine months on sabbatical at INRIA near Paris working with the O2 group. He has extensive experience of industrial consultancy including a long association with ICL and, more recently, with Perihelion Software. Atkinson's main research interest is in persistent programming languages, investigating the relationship between programming languages and database systems. He has held a number of research grants awarded by the UK SERC, in particular a project on Bulk Data Types, and has lead several research projects (see below).

Instructor: Ronald Morrison, U. of St. Andrews, UK

Ron Morrison is Professor of Software Engineering at the Univ. of St Andrews. He gained a BSc in Mathematics from the Univ. of Strathclyde in 1967, a Diploma and a MSc in CS from the Univ. of Glasgow in 1968 and 1970 respectively, and a PhD from the Univ. of St Andrews in 1979. His special interests are programming language design, persistent object systems and operating systems. Over the past 14 years he has worked extensively with Professor Atkinson of Glasgow University on the integrating technology called Persistent Programming. The work has been funded by STC Technology Ltd., SERC and, more recently, ESPRIT. He was one of the major designers and implementors of the persistent programming language PS-algol and led the team that designed and implemented Napier88. Ron Morrison has also co-chaired workshops in the Database Programming Language Series. Professors Atkinson and Morrison were leaders of the Alvey/SERC funded PISA project (Persistent Information Space Architectures) and are main researchers of the ESPRIT III BRA 6903 Fide2 project on Database Programming Languages in collaboration with colleagues in France, Germany and Italy. Together with Professor Buneman, they also founded the series of International Workshop on Persistent Object Systems in 1983.

3. Parallelism in Database Systems

Manipulating large (terabyte and petabyte) databases requires the database system to execute the operation in parallel using multiple processors, disks, and tapes concurrently. Many commercial systems offer mechanisms to do this. The first lecture explores the concepts and algorithms inside most parallel database systems. The second lecture describes the specific techniques used by commercially available (or promised) systems.

Concepts and techniques: The technology imperative for parallelism. Kinds of parallelism (pipeline, partition). Success metrics: speedup, batch scaleup, transaction scaleup. Data parallelism: partitioning schemes. Operation parallelism: streams and rivers. Specific operators: scan, aggregate, sort, join. Utility operations (load, backup/restore/recover, index, reorganize, verify). Optimization of parallel operations. Techniques used by specific systems (based on public information): Teradata, Tandem, Informix, Rdb/DBI, DB2, Oracle, Sybase.

Instructor: Jim Gray, Digital Equipment Corporation, USA

Dr. Gray is a specialist in database, transaction processing, and dependable computer systems. He founded Digital's San Francisco Systems Center where he is working on enhancements to Digital's commercial systems. These efforts center on the use of parallelism to process very large databases. He worked on Tandem's NonStop SQL and IBM's System R, SQL/DS, DB2, and IMS-Fast Path. He is an editor-in-chief of the VLDB journal and editor of the Performance Handbook for Database and Transaction Processing Systems, coauthor of Transaction Processing Concepts and Techniques, and editor of Morgan Kaufmann's Data Management Series. He is active in the National Research Council, and holds doctorates from Berkeley and Stutgart.

4. Interoperability in Multidatabase Systems

One of the most important and challenging problems of the 1990s is to provide techniques and mechanisms to support the interoperation and networking of database and knowledge base systems. Such systems have proliferated throughout organizations, based upon a variety of general-purpose database management technology, or constructed as data-intensive systems tailored to different application domains. It is critical to support the sharing and exchange of information among these database systems, while retaining as much as possible of the investment in the individual existing systems and their associated application software. This tutorial examines the problems, principles, techniques, and mechanisms to support the controlled sharing and exchange of information among a collection of data/knowledge base systems. We specifically examine the problem of database system interoperability from both data and application viewpoints. This balanced view should benefit both industrial practitioners (including strategic planners and decision makers, systems analysts/integrators, data modelers) as well as applied researchers in the area of database systems and application interoperability.

The first part presents a framework for database system interoperability. The key problems and issues in the networking and interoperation of database systems are described. Approaches to interoperation are reviewed, including: (enterprise-wide) integration; logically centralized, physically distributed databases; multi-model database systems, and federated database systems. A historical perspective is provided, stressing key research and development achievements as well as open problems. A viewpoint on the sharing and exchange of information among a collection of heterogeneous, autonomous database systems is presented. Federated databases and related architectural approaches are described. Relationships between sharing and exchange at the database system level with those at the network and operating system levels are considered.

The second part presents an application perspective. First, Multisystem Workflow Management. Many activities involve performing operations (workflows) on multiple independent systems. Workflows involve specification and support of dependencies among operations performed by different systems and databases. We will discuss interoperability requirements and applications of recent relaxed transaction models with an example industrial telecommunication multisystem application and prototype. Second, Multidatabase Consistency Constraints Management. Consistency of corporate data, even when it is managed by multiple systems, is an important requirement and a major business problem. We will discuss issues of specifying and enforcing consistency of interrelated data stored in multiple databases. Examples from industrial application systems will be discussed.

Instructor: Dennis McLeod, University of Southern California, USA

Dennis McLeod received his B.S., M.S., and Ph.D. degrees in CS from MIT in 1974, 1976, and 1978 (respectively). He joined the faculty of the Univ. of Southern California in 1978, where he is currently full professor in CS. His main research interests include: database system modeling, design, and evolution; database system interoperability and networking; information protection and security; knowledge management; applied machine learning; personal information management systems; and information management environments for digital libraries, scientific and engineering data, computer-integrated manufacturing, and computer- supported cooperative work. Dr. McLeod has over ninety refereed publications in the above areas and he is particularly noted for his work on semantic data modeling and federated databases. He has lectured widely on an international basis, and has served as an advisor and consultant to a variety of private and public sector organizations. Dr. McLeod has served as chair and member of program and organizational committees for numerous conferences and workshops, and is currently an editor of the Int. Journal on Very Large Databases, Int. Journal on Intelligent and Cooperative Information Systems, Comm. of the ACM, as well as other publications.

Instructor: Amit P. Sheth, Bell Communications Research, USA

Dr. Amit P. Sheth has led projects on developing a heterogeneous distributed database system, a factory information system, integration of AI-database systems, transactional workflows, federated database tools, multidatabase consistency, and data quality. His current interests also include semantic heterogeneity and information brokering in the emerging Infocosm. He is an ACM lecturer, has presented eleven tutorials and participated in several panels at major conferences, given over forty invited talks in many countries, and has authored over sixty publications. He is serving on the editorial boards of four journals, and has served as the general chair of the First Int. Conf. on Parallel and Distributed Information Systems (PDIS) and a program (co-)chair of the International Workshop on Interoperability in Multidatabase Systems, and currently is a program (co-)chair of the Third PDIS. Prior to joining Bellcore in 1989, he was a Principal/Staff Scientist at Honeywell and Unisys research centers.

5. Object Database Management: Database Design and Open Systems

Object database management systems are providing the data management solution for applications in computer integrated manufacturing, office information systems, and multimedia. This tutorial presents the concepts involved in designing object-oriented database applications, and a discussion of issues involved in the design of object-oriented database management systems. In particular, the first part is an overview of OO data modeling concepts and shows how they can be used to design database applications. Emphasis will be on designing both the database structure and the operations (methods) of the application. The OO approach will be compared with traditional conceptual database design that uses Extended Entity-Relationship Modeling. We will also show how a conceptual object-oriented design can be implemented on relational and object-oriented database systems.

The final part is an overview of the concepts and issues involved in the design of open object DBMSs including object-relational DBMSs. Topics to be covered include a comparison of object DBMS approaches including persistent programming languages, extended relational DBMSs, and object-relational DBMSs; selected system design issues including persistence models, object query processing, storage management, version models, and schema evolution; and a discussion of some of the challenges in building object-relational DBMSs.

Presentation: The course will be presented in Spanish. The transparencies will be in English, which will help the simultaneous translation process.

Instructor: José A. Blakeley, Texas Instruments, USA

Jose A. Blakeley is a member of the technical staff at Texas Instruments' Systems and Information Sciences Laboratory, where he is co-principal investigator of the Open OODB project (Phase II). He received a computer systems engineering degree from Instituto Tecnologico y de Estudios Superiores de Monterrey, Mexico, in 1978, and his M.Math. and Ph.D. in CS from the University of Waterloo, Canada in 1983 and 1987, respectively. Blakeley's research interests include extensible and object-oriented database management systems; object services architectures; query language design, optimization, and execution; and materialized view support. Blakeley is an Associate Editor of the ACM Sigmod Record.

Instructor: Ramez Elmasri, U. of Texas at Arlington, USA

Ramez Elmasri is a faculty member at the University of Texas at Arlington since 1990 in the CS and Engineering Department. He has been involved with database research and teaching for over 15 years. He is well known for his research on conceptual data modeling, query languages and user interfaces, schema integration for multi-database systems, and distributed databases. His recent research is on object-oriented modeling and temporal databases. Elmasri is co-author with S.Navathe of the bestselling textbook ``Fundamentals of Databases Systems" (Second Edition 1994, Benjamin/Cummings Publishing Company and Addison-Wesley International). He holds M.S. and Ph.D. degrees in CS from Stanford University. He has worked for Honeywell and University of Houston prior to his current position, and is a consultant for numerous organizations. He has published over 50 research papers.

6. Indexing Multimedia Databases

The tutorial surveys state-of-the-art methods for storing and retrieving multimedia data from large databases. Records (= documents) may consist of formatted fields, text, images, voice, animation etc. A sample query that we would like to support is `in a collection of 2-d color images, find images that are similar to a sunset photograph'. For text and formatted fields, several methods have been proposed and studied; the tutorial classifies these methods systematically, examines in detail the main representatives of each class, and highlights the environ- ment that each method is most suitable for. Indexing for images and other media is a new, active area of research; the tutorial will present recent approaches and prototype systems, for 2-d and 3-d medical image databases, 2-d color image databases, and 1-d time series databases. The content of the tutorial includes access methods for multi-dimensional points, access methods for text, indexing methods for images, time series and signals, in general.

Instructor: Christos Faloutsos, U. of Maryland at College Park, USA

Christos Faloutsos received the B.Sc. degree in Electrical Engineering (1981) from the National Technical University of Athens, Greece and the M.Sc. and Ph.D. degrees in CS from the University of Toronto, Canada. Since 1985 he has been with the department of CS at University of Maryland, College Park, where he is currently an associate professor. In 1989 he received the Presidential Young Investigator Award by the NSF. His research interests include physical data base design, searching methods for text, geographic information systems and indexing methods for medical and multimedia databases.