ACM SIGMOD Conference 2008: Vancouver, BC, Canada

Jason Tsong-Li Wang (Ed.): Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008. ACM 2008, ISBN 978-1-60558-102-6

Keynote Talks

Sridhar Ramaswamy:
Extreme data mining. 1-2
Ben Shneiderman:
Extreme visualization: squeezing a billion records into a million pixels. 3-12
William O'Connell:
Extreme streaming: business optimization driving algorithmic challenges. 13-14

Research Session 1: Tracking Data in Space

Leong Hou U, Man Lung Yiu, Kyriakos Mouratidis, Nikos Mamoulis:
Capacity constrained assignment in spatial databases. 15-28
Su Chen, Beng Chin Ooi, Kian-Lee Tan, Mario A. Nascimento:
ST²B-tree: a self-tunable spatio-temporal b⁺-tree index for moving objects. 29-42
Hanan Samet, Jagan Sankaranarayanan, Houman Alborzi:
Scalable network distance browsing in spatial databases. 43-54

Research Session 2: Ranking

Jianlin Feng, Qiong Fang, Wilfred Ng:
Discovering bucket orders from full rankings. 55-66
Nilesh Bansal, Sudipto Guha, Nick Koudas:
Ad-hoc aggregations of ranked lists in the presence of hierarchies. 67-78
Tianyi Wu, Dong Xin, Jiawei Han:
ARCube: supporting ranking aggregate queries in partially materialized data cubes. 79-92

Research Session 3: Privacy & Anonymization

Kun Liu, Evimaria Terzi:
Towards identity anonymization on graphs. 93-106
Xiaokui Xiao, Yufei Tao:
Dynamic anonymization: accurate statistical analysis with privacy preservation. 107-120
Gabriel Ghinita, Panos Kalnis, Ali Khoshgozaran, Cyrus Shahabi, Kian-Lee Tan:
Private queries in location based services: anonymizers are not necessary. 121-132

Research Session 4: Streaming Filters

Zhen Liu, Srinivasan Parthasarathy, Anand Ranganathan, Hao Yang:
Near-optimal algorithms for shared filter evaluation in data stream systems. 133-146
Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, Neil Immerman:
Efficient pattern matching over event streams. 147-160
Anirban Majumder, Rajeev Rastogi, Sriram Vanama:
Scalable regular expression matching on data streams. 161-172

Research Session 5: Clustering in High Dimensions

Feng Pan, Xiang Zhang, Wei Wang:
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition. 173-184
Christian Böhm, Christos Faloutsos, Claudia Plant:
Outlier-robust clustering using independent components. 185-198
Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl:
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction. 199-212

Research Session 6: Skylines

Xiang Lian, Lei Chen:
Monochromatic and bichromatic reverse skyline search over uncertain databases. 213-226
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis:
Angle-based space partitioning for efficient parallel skyline computation. 227-238
Nikos Sarkas, Gautam Das, Nick Koudas, Anthony K. H. Tung:
Categorical skylines for streaming data. 239-250

Research Session 7: Special Platforms

Matthias Brantner, Daniela Florescu, David A. Graf, Donald Kossmann, Tim Kraska:
Building a database on S3. 251-264
Mihai Lupu, Beng Chin Ooi, Y. C. Tay:
Paths to stardom: calibrating the potential of a peer-based data management system. 265-278
Sai Wu, Jianzhong Li, Beng Chin Ooi, Kian-Lee Tan:
Just-in-time query retrieval over partially indexed data on structured P2P overlays. 279-290
Chun-Hee Lee, Chin-Wan Chung:
Efficient storage scheme and query processing for supply chain management using RFID. 291-302

Research Session 8: XML Query Processing

Taro L. Saito, Shinichi Morishita:
Relational-style XML query. 303-314
Yu Huang, Ziyang Liu, Yi Chen:
Query biased snippet generation in XML search. 315-326
Kostas Lillis, Evaggelia Pitoura:
Cooperative XPath caching. 327-338
Giorgio Ghelli, Nicola Onose, Kristoffer Høgsbro Rose, Jérôme Siméon:
XML query optimization in the presence of side effects. 339-352

Research Session 9: Strings and Time

Xiaochun Yang, Bin Wang, Chen Li:
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. 353-364
Vassilis Athitsos, Panagiotis Papapetrou, Michalis Potamias, George Kollios, Dimitrios Gunopulos:
Approximate embedding-based subsequence matching of time series. 365-378
Rainer Gemulla, Wolfgang Lehner:
Sampling time-based sliding windows in bounded space. 379-392
Dhaval Patel, Wynne Hsu, Mong-Li Lee:
Mining relationships among interval-based events for classification. 393-404

Research Session 10: Graphs I

Huahai He, Ambuj K. Singh:
Graphs-at-a-time: query language and access methods for graph databases. 405-418
Saket Navlakha, Rajeev Rastogi, Nisheeth Shrivastava:
Graph summarization with bounded error. 419-432
Xifeng Yan, Hong Cheng, Jiawei Han, Philip S. Yu:
Mining significant graph patterns by leap search. 433-444
Nan Wang, Srinivasan Parthasarathy, Kian-Lee Tan, Anthony K. H. Tung:
CSV: visualizing and mining cohesive subgraphs. 445-458

Research Session 11: Privacy and Testing

Wenliang Du, Zhouxuan Teng, Zutao Zhu:
Privacy-MaxEnt: integrating background knowledge in privacy quantification. 459-472
Jiexing Li, Yufei Tao, Xiaokui Xiao:
Preservation of proximity privacy in publishing numerical sensitive data. 473-486
Michael Benedikt, Alan Jeffrey, Ruy Ley-Wild:
Stream firewalling of xml constraints. 487-498
Chaitanya Mishra, Nick Koudas, Calisto Zuzarte:
Generating targeted queries for database testing. 499-510

Research Session 12: Query Optimization

Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, Pedro V. Sander:
Relational joins on graphics processors. 511-524
Yu Cao, Gopal C. Das, Chee Yong Chan, Kian-Lee Tan:
Optimizing complex queries with multiple relation instances. 525-538
Guido Moerkotte, Thomas Neumann:
Dynamic programming strikes back. 539-552
Damien Sereni, Pavel Avgustinov, Oege de Moor:
Adding magic to an optimising datalog compiler. 553-566

Research Session 13: Graphs II

Yuanyuan Tian, Richard A. Hankins, Jignesh M. Patel:
Efficient aggregation for graph summarization. 567-580
Gang Gou, Rada Chirkova:
Efficient algorithms for exact ranked twig-pattern matching over graphs. 581-594
Ruoming Jin, Yang Xiang, Ning Ruan, Haixun Wang:
Efficiently answering reachability queries on very large directed graphs. 595-608
Ding Chen, Chee Yong Chan:
Minimization of tree pattern queries with constraints. 609-622

Research Session 14: Ordered Data

Soumyadeb Mitra, Marianne Winslett, Windsor W. Hsu:
Query-based partitioning of documents and indexes for information lifecycle management. 623-636
Ross Shaull, Liuba Shrira, Hao Xu:
Skippy: a new snapshot indexing method for time travel in the storage manager. 637-648
Eric Lo, Ben Kao, Wai-Shing Ho, Sau Dan Lee, Chun Kit Chui, David W. Cheung:
OLAP on sequence data. 649-660
Ranjan Sinha, Simon J. Puglisi, Alistair Moffat, Andrew Turpin:
Improving suffix array locality for fast pattern matching on disk. 661-672

Research Session 15: Probabilistic I

Ming Hua, Jian Pei, Wenjie Zhang, Xuemin Lin:
Ranking queries on uncertain data: a probabilistic threshold approach. 673-686
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, Peter J. Haas:
MCDB: a monte carlo approach to managing uncertain data. 687-700
Benny Kimelfeld, Yuri Kosharovsky, Yehoshua Sagiv:
Query efficiency in probabilistic XML models. 701-714
Christopher Ré, Julie Letchner, Magdalena Balazinska, Dan Suciu:
Event queries on correlated probabilistic streams. 715-728

Research Session 16: Transactions and Distribution

Michael J. Cahill, Uwe Röhm, Alan David Fekete:
Serializable isolation for snapshot databases. 729-738
Emmanuel Cecchet, George Candea, Anastasia Ailamaki:
Middleware-based database replication: the gaps between theory and practice. 739-752
Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis:
On efficient top-k query processing in highly distributed environments. 753-764
Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan:
Efficient bulk insertion into a distributed ordered table. 765-778

Research Session 17: Probabilistic II

Xiaolei Li, Jiawei Han, Zhijun Yin, Jae-Gil Lee, Yizhou Sun:
Sampling cube: a framework for statistical olap over sampling data. 779-790
Arvind Thiagarajan, Samuel Madden:
Querying continuous functions in a database system. 791-804
Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin:
An efficient filter for approximate membership checking. 805-818
Qin Zhang, Feifei Li, Ke Yi:
Finding frequent items in probabilistic data. 819-832

Research Session 18: Database Integration As You Go

Laura Chiticariu, Phokion G. Kolaitis, Lucian Popa:
Interactive generation of integrated schemas. 833-846
Shawn R. Jeffery, Michael J. Franklin, Alon Y. Halevy:
Pay-as-you-go user feedback for dataspace systems. 847-860
Anish Das Sarma, Xin Dong, Alon Y. Halevy:
Bootstrapping pay-as-you-go data integration systems. 861-874
Yan Qi, K. Selçuk Candan, Jun'ichi Tatemura, Songting Chen, Fenglin Liao:
Supporting OLAP operations over imperfectly integrated taxonomies. 875-888

Research Session 19: Keywords on Structure

Sandeep Tata, Guy M. Lohman:
SQAK: doing more with keywords. 889-902
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou:
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. 903-914
Quang Hieu Vu, Beng Chin Ooi, Dimitris Papadias, Anthony K. H. Tung:
A graph method for keyword-based selection of the top-K databases. 915-926
Konstantin Golenberg, Benny Kimelfeld, Yehoshua Sagiv:
Keyword proximity search in complex data graphs. 927-940

Research Session 20: Tuning and Probing

Nicolas Bruno, Rimma V. Nehme:
Configuration-parametric query optimization for physical design tuning. 941-952
Ahmed A. Soror, Umar Farooq Minhas, Ashraf Aboulnaga, Kenneth Salem, Peter Kokosielis, Sunil Kamath:
Automatic virtual machine configuration for database workloads. 953-966
Daniel J. Abadi, Samuel Madden, Nabil Hachem:
Column-stores vs. row-stores: how different are they really? 967-980
Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraker:
OLTP through the looking glass, and what we found there. 981-992

Research Session 21: Provenance, Integration and Extraction

Adriane Chapman, H. V. Jagadish, Prakash Ramanan:
Efficient provenance storage. 993-1006
Thomas Heinis, Gustavo Alonso:
Efficient lineage tracking for scientific workflows. 1007-1018
Wensheng Wu, Berthold Reinwald, Yannis Sismanis, Rajesh Manjrekar:
Discovering topical structures of databases. 1019-1030
Warren Shen, Pedro DeRose, Robert McCann, AnHai Doan, Raghu Ramakrishnan:
Toward best-effort information extraction. 1031-1042

Industrial Session 1: Query Optimization and Performance

Yu Xu, Pekka Kostamaa, Xin Zhou, Liang Chen:
Handling data skew in parallel joins in shared-nothing systems. 1043-1052
Sunil Chakkappen, Thierry Cruanes, Benoît Dageville, Linan Jiang, Uri Shaft, Hong Su, Mohamed Zaït:
Efficient and scalable statistics gathering for large databases in Oracle 11g. 1053-1064
Andrey Balmin, Fatma Özcan, Ashutosh Singh, Edison Ting:
Grouping and optimization of XPath expressions in DB2 pureXML. 1065-1074
Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim:
A case for flash memory ssd in enterprise database applications. 1075-1086

Industrial Session 2: Database Programming and Performance

José A. Blakeley, Vineet Rao, Isaac Kunen, Adam Prout, Mat Henaire, Christian Kleinerman:
.NET database programmability and extensibility in microsoft SQL server. 1087-1098
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins:
Pig latin: a not-so-foreign language for data processing. 1099-1110
George Eadon, Eugene Inseok Chong, Shrikanth Shankar, Ananth Raghavan, Jagannathan Srinivasan, Souripriya Das:
Supporting table partitioning by reference in oracle. 1111-1122

Industrial Session 3: Streams, Conversations and Verification:

Bugra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, Myungcheol Doo:
SPADE: the system s declarative stream processing engine. 1123-1134
Theodore Johnson, S. Muthu Muthukrishnan, Vladislav Shkapenyuk, Oliver Spatscheck:
Query-aware partitioning for monitoring massive network data streams. 1135-1146
Ullas Nambiar, Himanshu Gupta, Raju Balakrishnan, Mukesh K. Mohania:
Helping satisfy multiple objectives during a service desk conversation. 1147-1158
Leonidas Galanis, Supiti Buranawatanachoke, Romain Colle, Benoît Dageville, Karl Dias, Jonathan Klein, Stratos Papadomanolakis, Leng Leng Tan, Venkateshwaran Venkataramani, Yujun Wang, Graham Wood:
Oracle database replay. 1159-1170

Industrial Session 4: Data and Application Integration, Spatial Data

David E. Simmen, Mehmet Altinel, Volker Markl, Sriram Padmanabhan, Ashutosh Singh:
Damia: data mashups for intranet applications. 1171-1182
Li Ma, Chen Wang, Jing Lu, Feng Cao, Yue Pan, Yong Yu:
Effective and efficient semantic web data management over DB2. 1183-1194
Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, Jan Rittinger:
Multi-tenant databases for software as a service: schema-mapping techniques. 1195-1206
Yi Fang, Marc Friedman, Giri Nair, Michael Rys, Ana-Elisa Schmid:
Spatial indexing in microsoft SQL server 2008. 1207-1216

Demonstration Session: Group 1

Robert Albright, Alan J. Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee, Rick Keilty, Gregory Sadowski, Ben Sowell, Walker M. White:
SGL: a scalable language for data-driven games. 1217-1222
Florin Rusu, Fei Xu, Luis Leopoldo Perez, Mingxi Wu, Ravi Jampani, Chris Jermaine, Alin Dobra:
The DBO database system. 1223-1226
Chaitanya Mishra, Nick Koudas:
Stretch 'n' shrink: resizing queries to user preferences. 1227-1230
Arvind Arasu, Surajit Chaudhuri, Kris Ganjam, Raghav Kaushik:
Incorporating string transformations in record matching. 1231-1234
Nitin Gupta, Alan J. Demers, Johannes Gehrke:
SEMMO: a scalable engine for massively multiplayer online games. 1235-1238
Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne E. Hambrusch, Rahul Shah:
Orion 2.0: native support for uncertain data. 1239-1242
Vibhuti S. Sengar, Tanuja Joshi, Joseph M. Joy, Samarth Prakash:
Building a global location search service. 1243-1246
Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor:
Freebase: a collaboratively created graph database for structuring human knowledge. 1247-1250
Carlos Eduardo Scheidegger, Huy T. Vo, David Koop, Juliana Freire, Cláudio T. Silva:
Querying and re-using workflows with VsTrails. 1251-1254
Nikos Pelekis, Elias Frentzos, Nikos Giatrakos, Yannis Theodoridis:
HERMES: aggregative LBS via a trajectory DB engine. 1255-1258

Demonstration Session: Group 2

Geert Jan Bex, Frank Neven, Stijn Vansummeren:
SchemaScope: a system for inferring and cleaning XML schemas. 1259-1262
Ming Hua, Jian Pei:
DiMaC: a system for cleaning disguised missing data. 1263-1266
Iman Elghandour, Ashraf Aboulnaga, Daniel C. Zilio, Fei Chiang, Andrey Balmin, Kevin S. Beyer, Calisto Zuzarte:
An xml index advisor for DB2. 1267-1270
Alessandro Raffio, Daniele Braga, Stefano Ceri, Paolo Papotti, Mauricio A. Hernández:
Clip: a tool for mapping hierarchical schemas. 1271-1274
Jun'ichi Tatemura, Songting Chen, Fenglin Liao, Oliver Po, K. Selçuk Candan, Divyakant Agrawal:
UQBE: uncertain query by example for web service mashup. 1275-1280
Bogdan Alexe, Laura Chiticariu, Renée J. Miller, Daniel Pepper, Wang Chiew Tan:
Muse: a system for understanding and designing mappings. 1281-1284
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Shady Elbassuoni, Maya Ramanath, Gerhard Weikum:
NAGA: harvesting, searching and ranking knowledge. 1285-1288
Angela Bonifati, Giansalvatore Mecca, Alessandro Pappalardo, Salvatore Raunich, Gianvito Summa:
The Spicy system: towards a notion of mapping quality. 1289-1294
Heiko Müller, Peter Buneman, Ioannis Koltsidas:
XArch: archiving scientific and reference data. 1295-1298
Kathleen Fisher, David Walker, Kenny Qili Zhu:
LearnPADS: automatic tool generation from ad hoc data. 1299-1302

Demonstration Session: Group 3

Jeong-Hyon Hwang, Sanghoon Cha, Ugur Çetintemel, Stanley B. Zdonik:
Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. 1303-1306
Chi-Yin Chow, Mohamed F. Mokbel, Tian He:
Tinycasper: a privacy-preserving aggregate location monitoring system in wireless sensor networks. 1307-1310
Alexander Böhm, Erich Marth, Carl-Christian Kanne:
The Demaq system: declarative development of distributed applications. 1311-1314
Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, Ying Zheng:
ProSem: scalable wide-area publish/subscribe. 1315-1318
Nodira Khoussainova, Evan Welbourne, Magdalena Balazinska, Gaetano Borriello, Garrett Cole, Julie Letchner, Yang Li, Christopher Ré, Dan Suciu, Jordan Walke:
A demonstration of Cascadia through a digital diary application. 1319-1322
Sihem Amer-Yahia, Alban Galland, Julia Stoyanovich, Cong Yu:
From del.icio.us to x.qui.site: recommendations in social tagging sites. 1323-1326
Rubi Boim, Tova Milo:
Enriching topic-based publish-subscribe systems with related content. 1327-1330
Ying Zhang, Peter A. Boncz:
XRPC: distributed XQuery and update processing with heterogeneous XQuery engines. 1331-1336
Ghislain Fourny, Donald Kossmann, Tim Kraska, Markus Pilman, Daniela Florescu:
XQuery in the browser. 1337-1340
Yizhou Sun, Tianyi Wu, Zhijun Yin, Hong Cheng, Jiawei Han, Xiaoxin Yin, Peixiang Zhao:
BibNetMiner: mining bibliographic information networks. 1341-1344

Tutorials

Susan B. Davidson, Juliana Freire:
Provenance and scientific workflows: challenges and opportunities. 1345-1350
Elizabeth J. O'Neil:
Object/relational mapping 2008: hibernate and the entity data model (edm). 1351-1356
Jian Pei, Ming Hua, Yufei Tao, Xuemin Lin:
Query answering techniques on uncertain and probabilistic data: tutorial summary. 1357-1364
Eduardo Freire Nakamura, Antonio Alfredo Ferreira Loureiro:
Information fusion in wireless sensor networks. 1365-1372
Joseph A. Konstan:
Introduction to recommender systems. 1373-1374

Corrigendum

Alexandr Andoni, Ronald Fagin, Ravi Kumar, Mihai Patrascu, D. Sivakumar:
Corrigendum to "efficient similarity search and classification via rank aggregation" by Ronald Fagin, Ravi Kumar and D. Sivakumar (proc. SIGMOD'03). 1375-1376