VLDB 2013 Logo,  Design by Sakis Palpana

Keynotes
Keynotes

Data Infrastructure at Web Scale
Jay Parikh, VP of Infrastructure Engineering, Facebook

keynote slides
keynote video

Abstract: Nearly every team at Facebook depends on the company's custom-built data infrastructure for warehousing and analytics, with roughly 1,000 people across the company -technical and non-technical- using these technologies every day. Given Facebook's unique scalability challenges (their data warehouse is more than 250 PB in size, they add 600 TB of new data every day) and processing needs (they crunch more than 10 PB of data a day), the company's data infrastructure team has to ensure that its systems are prepared to handle not just today's challenges, but tomorrow's as well. In this session, Facebook's Jay Parikh will provide an overview of the company's data infrastructure, focusing on the custom-built technologies they've developed -including Corona, Presto, Morse, and Giraph- to meet the scale challenges they face.

Bio: Jay Parikh is the VP of infrastructure engineering at Facebook. In that role, he leads the engineering and operations teams responsible for building and maintaining an infrastructure that serves more than a billion users, developers, and partners worldwide. Prior to Facebook, Jay was senior vice president of engineering and operations at Ning, where he oversaw the scaling of the company’s social networking platform from 50,000 social networks to more than 1.5 million social networks. Before Ning, Jay was the vice president of engineering at Akamai Technologies, where he helped build the world’s largest and most globally distributed computing platform.

The DataHub: A Collaborative Data Analytics and Visualization Platform
Samuel Madden, Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory

keynote slides
keynote video

Abstract: In this talk, I will describe a new system we are building at MIT, called DataHub. DataHub is a hosted interactive data processing, sharing, and visualization system for large-scale data analytics. Key features of DataHub include: (i) Flexible ingest and data cleaning tools to help massage data into a form that users can write programs that operate on it. This includes both removing irregularity as well as exposing structure from unstructured data such as text files and images. (ii) A scalable, parallel, SQL-based analytic data processing engine optimized for extremely low-latency operation on large data sets, by exploiting massive parallelism available in modern GPUs and upcoming “manycore” CPUs. (iii) An interactive visualization system that is tightly coupled to the data processing and lineage engine. Specifically, DataHub provides a workflow-based visualization engine where users can choose from a library of pre-built visualizations, or define their own visualizations via a simple API. Analysis and visualization steps may run on either CPUs or manycore/GPU devices. (iv) Finally, Datahub is a hosted data platform, designed to eliminate the need for users to manage their own database. It includes features that allow users to selectively share their data with other users, using complex context-sensitive predicates (e.g., that data about particular times or location should be visible to particular users).

Bio: Samuel Madden is a Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory. His research interests include databases, distributed computing, and networking. Research projects include the C-Store column-oriented database system, the CarTel mobile sensor network system, and the Relational Cloud "database-as-a-service". Madden is a leader in the emerging field of "Big Data", heading the Intel Science and Technology Center (ISTC) for Big Data, a multi-university collaboration on developing new tools for processing massive quantities of data. He also leads BigData@CSAIL, an industry-backed initiative to unite researchers at MIT and leaders from industry to investigate the issues related to systems and algorithms for data that is high rate, massive, or very complex. Madden received his Ph.D. from the University of California at Berkeley in 2003 where he worked on the TinyDB system for data collection from sensor networks. Madden was named one of Technology Review's Top 35 Under 35 in 2005, and is the recipient of several awards, including an NSF CAREER Award in 2004, a Sloan Foundation Fellowship in 2007, best paper awards in VLDB 2004 and 2007, MobiCom 2006, CIDR 2013, EuroSys 2013, and a SIGMOD Test of Time Award for his 2003 paper "The Design of an Acquisitional Query Processor for Sensor Networks."

Privacy-Preserving Data Analysis: From Fallacious to Felicitous ... and to Fruition
Cynthia Dwork, Distinguished Scientist at Microsoft Research

keynote slides
keynote video

Abstract: Privacy-preserving data analysis, also known as statistical disclosure control, has a large literature that spans several disciplines. Many early attempts have proved problematic either in practice or on paper. A new approach, based on the definitional concept of "differential privacy," has provided a theoretically sound and powerful framework that has given rise to an explosion of research. This talk motivates and explains the definition of differential privacy, describes some basic techniques for achieving it, and discusses some of the technical and cultural obstacles to bringing this approach to fruition.

Bio: Cynthia Dwork, Distinguished Scientist at Microsoft Research, is renowned for placing privacy-preserving data analysis on a mathematically rigorous foundation. A cornerstone of this work is differential privacy, a strong privacy guarantee frequently permitting highly accurate data analysis. Dr. Dwork has also made seminal contributions in cryptography and distributed computing, and is a recipient of the Edsger W. Dijkstra Prize, recognizing some of her earliest work establishing the pillars on which every fault-tolerant system has been built for decades. She is a member of the US National Academy of Engineering and a Fellow of the American Academy of Arts and Sciences.


© VLDB 2013