tutorial program - vldb2003

Tutorial Program

Tutorial 1

Tutorial 2

Tutorial 3

Tutorial 4

Tutorial 5

Tutorial 1: Tuesday, September 09, 2003, 11:30-13:00 & 14:30-16:00

Privacy-Enhanced Data Management for Next-Generation e-Commerce Speakers: Chris Clifton (Purdue University, USA), Irini Fundulaki, Arnaud Sahuguet (Bell Labs, Lucent Technologies)
(PDF Presentation Slides)	Chris Clifton	Irini Fundulaki	Arnaud Sahuguet
	Purdue University, USA	Bell Labs, Lucent Technologies

Abstract This tutorial's goal is to provide developers, IT professionals, and researchers an overview of the issues and techniques for maintaining user privacy preferences for e-services. Electronic commerce is becoming pervasive. Convergence of wireless, wireline, and telephony networks enables a new level of web services. Will these services be a benefit, or a new avenue for spam? Extensive profiling and information sharing is needed to ensure that people get the services they want, and only what they want. We must ensure this personal, private information is used properly -- to deliver desired web services. We must ensure that pervasive doesn't become invasive. Meeting this goal requires advances in several technologies: profile data management, preference and policy management, personalized and privacy-conscious data sharing, and privacy conscious data mining. This tutorial will survey techniques (e.g., data obfuscation, Privacy Conscious computation of Association Rules), standards (e.g., XACML, P3P/APPEL), and systems (e.g., Hippocratic databases, Houdini) that are used to preserve user privacy, while allowing companies to extract value. It will cover both queries about the individual (e.g., a person's address) and queries about large groups of people (i.e., data mining). It will describe the types of privacy constraints and their sources, including legal regulations (Telecomm regulations, EU 95/46, etc.), contractual obligations, and others. IT professionals, developers, and researchers interested in privacy should attend this tutorial.

About the speakers Chris Clifton is an Associate Professor of Computer Science at Purdue University. He has a Ph.D. from Princeton University, and Bachelor's and Master's degrees from the Massachusetts Institute of Technology. Prior to joining Purdue in 2001, Chris had served as a Principal Scientist at The MITRE Corporation and as an Assistant Professor of Computer Science at Northwestern University. His research interests include data mining, data security, database support for text, and heterogeneous databases. Arnaud Sahuguet is a Member of the Technical Staff in the Network Data and Services Department at Bell Laboratories, Lucent Technologies. He graduated from the Ecole Polytechnique in 1994 and from the Ecole Nationale des Ponts et Chauss'ees in 1996. He received his Ph.D. in computer science at the University of Pennsylvania in December 2001. His research interests include cryptography and electronic commerce, information retrieval/extraction and database technology including XML. He is the co-developer of W4F, a rule-based HTML-to-XML screen-scraper and the main architect of Kweelt, an open-source Java implementation of the Quilt query language (now XQuery). He is a leading figure in the 3GPP Generic User Profile (GUP) standards work. Irini Fundulaki holds a Post Doc position at the Network Data and Services Department at Bell Laboratories, Lucent Technologies. She graduated from the University of Crete, Greece in 1994 and she received her M.Sc. from the same University in 1996. She received her Ph.D. in computer science from the Conservatoire National des Arts et M'etiers in Paris on January 2003. During her Ph.D. she was a member of the Verso group at INRIA-Rocquencourt. Her research interests include semantic data integration, database technology, XML and related technologies, and more recently user profile data management.

Tutorial 2: Tuesday, September 9, 2003, 16:30-18:00 and Wednesday, September 10, 2003, 11:00-12:30

The Semantic Web: Semantics for Data on the Web Speakers: Stefan Decker (USC Information Sciences Institute), Vipul Kashyap (LHNCBC, US National Library of Medicine, USA)
(PDF Presentation Slides)	Stefan Decker	Vipul Kashyap
	USC Information Sciences Institute	LHNCBC, US National Library of Medicine, USA

Abstract

About the speakers Stefan Decker currently works as Computer Scientist and Research Assistent Professor at the Information Sciences Institute at the University of Southern California. Previously, he was awarded a PhD from the University of Karlsruhe in Germany and worked at the database group at Stanford University. His interests are: The next generation WWW, adding semantics to the Web, Knowledge Representation, inferencing with RDF, Web Service Composition, P2P technology using HyperCup and Edutella, adding semantics to the Grid, and Science and the Semantic Web. Vipul Kashyap is a Research Fellow at the National Library of Medicine I am currently working in the Lister Hill National Center on Biomedical Communications on issues relating to the Semantic Web and medical ontologies. Though the concept of the Semantic Web is currently gaining steam, various researchers (including yours truly) have been working on closely related problems for a long time. In particular, his thesis Information Brokering across Heterogeneous Multimedia Data: A Metadata-based Approach investigates the use of ontologies for capturing, querying and reasoning about information stored in structured and text databases. Interestingly, most of this has relevance today in the context of the Semantic Web. Vipul did my thesis at the Computer Science Department at Rutgers University. I have also worked at Applied Research division of Telcordia Technologies (formerly known as Bellcore), and Micro-electronics and Computer Technological Corporation (MCC) on topics related to information brokering and the Semantic Web. Over the years, he worked on various prototypes and systems for information brokering, gathering and analysis over distributed networks, such as the internet.

Tutorial 3: Wednesday, September 10, 2003, 14:00-15:30 and 16:00-17:30

Data Stream Query Processing: A Tutorial Speakers: Nick Koudas (AT&T Labs-Research), Divesh Srivastava (AT&T Labs-Research)
(PDF Presentation Slides)	Nick Koudas	Divesh Srivastava
	AT&T Labs-Research	AT&T Labs-Research

Abstract Stream data are generated naturally during the measurement and monitoring of complex, dynamic phenomena (such as traffic evolution in internet and telephone communication infrastructures, usage of the web, email and newsgroups, movement of financial markets, atmospheric conditions, etc.), and also by (message-based) web services, in which loosely coupled systems interact by exchanging high volumes of business data (e.g., purchase orders, retail transactions) tagged in XML (the lingua franca of web services). The applications that operate on modern data streams require sophisticated queries to continuously match, correlate, extract and transform parts of the data stream. Manipulating stream data presents many technical challenges and is an active research area in the database community, involving new stream operators, SQL extensions, query optimization methods, operator scheduling techniques, etc., with the goal of developing general-purpose (e.g., NiagaraCQ, Stanford Stream, Telegraph, Aurora) and specialized (e.g., Gigascope) data stream management systems. The objective of this tutorial is to provide a comprehensive and cohesive overview of the key research results in the area of data stream query processing, both for SQL-like and XML query languages. The tutorial is example driven, and organized as follows. Applications, Query Processing Architectures: Data stream applications, data and query characteristics, query processing architectures of commercial and prototype systems. Stream SQL Query Processing: Filters, simple and complex joins, aggregation, SQL extensions, approximate answers, query optimization methods, operator scheduling techniques. Stream XML Query Processing: Automata- and navigation-based techniques for single and multiple XPath queries, connections with stream SQL query processing. The target audience of this tutorial includes researchers in database systems, database and Web application developers, and the XML community.

About the speakers Nick Koudas is a Principal Technical Staff Member at AT&T Labs-Research. He holds a Ph.D. from the University of Toronto, a M.Sc. from the University of Maryland at College Park, and a B.Tech. from the University of Patras in Greece. He serves as an associate editor for the Information Systems journal and the IEEE TKDE journal. He is the recipient of the 1998 ICDE Best Paper award. His research interests include core database management, metadata management and its applications to networking. Divesh Srivastava is the head of the Database Research Department at AT&T Labs-Research. He received his Ph.D. from the University of Wisconsin, Madison, and his B.Tech. from the Indian Institute of Technology, Bombay, India, He was a vice-chair of ICDE 2002, and is on the editorial board of the ACM SIGMOD Digital Review. His current research interests include XML databases, IP network data management, and data quality.

Tutorial 4: Thursday, September 11, 2003, 14:00-15:30 and 16:00-17:30

Grid Data Management Systems & Services Speakers: Arun Jagatheesan, Reagan Moore (San Diego Supercomputer Center, USA), Norman W. Paton (University of Manchester, UK), Paul Watson (University of Newcastle-upon-Tyne, UK)
(PDF Presentation Slides)	Arun Jagatheesan	Reagan Moore	Paul Watson	Norman W. Paton
	San Diego Supercomputer Center, USA		University of Newcastle-upon-Tyne, UK	University of Manchester, UK

Abstract The Grid is an emerging infrastructure for providing coordinated and consistent access to distributed, heterogeneous computational and information storage resources amongst autonomous organizations. Data grids are being built across the world as the next generation data handling systems for sharing access to data and storage systems within multiple administrative domains. A data grid provides logical name spaces for digital entities and storage resources to create global identifiers that are location independent. Data grid systems provide services on the logical name space for the manipulation, management, and organization of digital entities. Databases are increasingly being used within Grid applications for data and metadata management, and several groups are now developing services for the access and integration of structured data on the Grid. The service-based approach to making data available on the Grid is being encouraged by the adoption of the Open Grid Services Architecture (OGSA), which is bringing about the integration of the Grid with Web Service technologies. The tutorial will introduce the Grid, and examine the requirements, issues and possible solutions for integrating data into the Grid. It will take examples from current systems, in particular the SDSC Storage Resource Broker and the OGSA-Database Access and Integration project.

About the speakers Arun Jagatheesan is an Adjunct Researcher at the Institute for High Energy Physics and Astrophysics at the University of Florida and a visiting scholar at the San Diego Supercomputer Center (SDSC). His research interests include Data Grid Management, Internet Computing and Workflow Systems. He leads the SDSC Matrix Team and is involved in Research and Development of multiple data grid projects at SDSC. Reagan Moore is a Distinguished Scientist and the Co-Program Director of the Data and Knowledge Systems Group at the San Diego Supercomputer Center. His research interests include data grids, digital libraries, and persistent archives. Dr. Moore manages multiple research projects including the NSF National Science Digital Library, NARA, NASA, Library of Congress, DOE Particle Physics Data Grid, NSF National Virtual Observatory, and NSF NPACI program. Norman Paton is a Professor of Computer Science at the University of Manchester, where he co-leads the Information Management Group. He works principally on distributed information management, spatiotemporal databases, and genome data management. He is Co-Chair of the Database Access and Integration Services Working Group of the Global Grid Forum. Paul Watson is a Professor of Computer Science at the University of Newcastle and Director of the North- East Regional e-Science Centre. His research has mainly been in the area of high performance database systems, including the design of a number of parallel database servers in both academia and industry. He and Paton are co-leaders of a number of projects on databases and the Grid, with a focus on distributed query processing.

Tutorial 5: Friday, September 12, 2003, 10:15-11:45 and 12:00-13:30

Constructing and integrating data-centric Web Applications: Methods, Tools, and Techniques Speakers: Stefano Ceri (Politecnico di Milano, Italy), Ioana Manolescu (INRIA Futurs, France)
(PDF Presentation Slides)	Stefano Ceri	Ioana Manolescu
	Politecnico di Milano, Italy	INRIA Futurs, France

Abstract This tutorial deals with the construction of data-centric Web applications, focusing on the modeling of processes and on the integration with Web services. Its objective is to clarify, at a high level of abstraction, the problems to be solved and the required modeling concepts and solutions. The used notation is compatible with the most popular standards (such as UML, ER, WSDL, and DPEL4WS) but it is both visual - because every concept has an associated graphical representation - and highly declarative - thus omitting the syntactic details which are polluting the aforementioned standards. We target data-centered Web applications, i.e. those applications whose main mission is to enable the browsing of complex data collections, and therefore are directly relevant to a forum such as VLDB. These applications are characterized by four orthogonal design dimensions: the data schemas, the business logic, the hypertexts for navigating on the client's Web interface, and the styles of presentation. First, we show how data and hypertexts can be modeled by means of WebML (Web Modeling Language), then we address the capturing the workflows (business logic) embedded in Web applications, and then we introduce Web services, which are becoming the dominant technology for building distributed computations. Finally, we concentrate on workflow-style composition of Web services. The tutorial illustrates, for each design step, the model, method, and best practice applied to a single, progressively developed, running example. The tutorial is dedicated to anyone wishing to deeply understand a single conceptual framework for building Web information systems, by "gluing" data extracted from repositories, workflows, and Web services. It is recommended to application developers but is suggested also to researchers and instructors in the database community

About the speakers Stefano Ceri is full professor of Database Systems at the Dipartimento di Elettronica e Informazione, Politecnico di Milano. His research interests are focused on extending database technology to incorporate distribution and rules and on design methods for databases and data-intensive WEB sites. He is responsible of several Esprit projects at Politecnico di Milano, including "WebSI: Data-Centric Web Services Integrator" (2002-04). He is member of the VLDB Endowment and of the EDBT Foundation; he is author of about 150 articles on International Journals and Conference Proceedings and of several books, including Designing Data-Intensive Web Applications (Morgan-Kaufmann, 2002). He is Editor-in-Chief of the new book series 'Data Centric Systems and Applications (DCSA)', Springer-Verlag. He won the '10 Years Award' of VLDB 2000 for his research on active databases and was the Coordinating Program Chair of VLDB 2001 in Roma. Ioana Manolescu is a researcher in the GEMO group of the INRIA Futurs research institute, in Orsay, France. Her current research is focuses on data and web service integration, in peer-to-peer systems. Ioana is also interested in the issues related to XML query processing; she is currently involved in the development of XQueC, a compressed XML data management prototype. Ioana is also participating to the design of a distributed peer-to-peer XML query processing system, in the framework of the ActiveXML project. Ioana's work on Web application design started in 2002, during a six-month visit in Politecnico di Milano, where she worked with the WebML team on the integration of workflow and Web service primitives within the WebML design framework.