Proceedings of Workshops at the 51st International Conference on Very Large Data Bases (VLDB 2025)

VLDBW 2025

VLDB 2025 Workshop Chairs

Norman Paton, John Paparrizos

VLDB 2025 Workshop Proceeding Chair

Jiuqi Wei

Accepted Workshops

ADMS: 16th Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architecture

  • Workshop Chairs: Rajesh Bordawekar, Tirthankar Lahiri

AIDB: 6th Applied AI for Database Systems and Applications

  • Workshop Chairs: Umar Farooq Minhas, Subru Krishnan, Thaleia Doudali

CDMS: 3rd International Workshop on Composable Data Management Systems

  • Workshop Chairs: Satyanarayana R Valluri, Mohamed Zait

DaSH: 5th Data Science with Human-in-the-Loop

  • Workshop Chairs: Eduard Dragut, Yunyao Li, Lucian Popa, Kun Qian, Sherry Tongshuang Wu

DATAI: 2nd International Workshop on Data-Centric AI

  • Workshop Chairs: Hongzhi Wang, Nan Tang

GuideAI: 2nd Governance, Understanding and Integration of Data for Effective and Responsible AI

  • Workshop Chairs: Sainyam Galhotra, Babak Salimi

LS-NSL: 1st Workshop on New Ideas for Large-Scale Neurosymbolic Learning Systems

  • Workshop Chairs: Efthymia Tsamoura, Pablo Barceló, Jacopo Urbani

LSGDA: 4th International Workshop on Large-Scale Graph Data Analytics

  • Workshop Chairs: Wenjie Zhang, Ying Zhang, Zhengyi Yang, Dong Wen, Wentao Li

QDB: 14th International Workshop on Quality in Databases

  • Workshop Chairs: Lisa Ehrlinger, Sourav S Bhowmick, Lorena Etcheverry, Hazar Harmouch

TaDA: 3rd International Workshop on Tabular Data Analysis

  • Workshop Chairs: Vasilis Efthymiou, Oktie Hassanzadeh, Sainyam Galhotra, Ernesto Jiménez-Ruiz, Chuan Lei

DEC: 3rd Data EConomy Workshop

  • Workshop Chairs: Santiago Andrés Azcoitia, George Konstantinidis

LLM+Graph: 2nd International Workshop on Data Management Opportunities in Bringing LLMs with Graph Data

  • Workshop Chairs: Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan

LLM+Spatial: 1st Large Language Models for Spatial-rich Data Management

  • Workshop Chairs: Jianqiu Xu, Cheng Long, Bernhard Seeger, Yongxin Tong

PhD: VLDB 2025 PhD Workshop

  • Workshop Chairs: Sonia Bergamaschi, Raul Castro Fernandez

ADMS

16th Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architecture

High Throughput GPU-Accelerated FSST String Compression

Tim Anema, Delft University of Technology, Joost Hoozemans, Voltron Data, Zaid Al-Ars Delft University of Technology, and H. Peter Hofstee, IBM.

Demystifying CXL Memory Bandwidth Expansion for Analytical Workloads

Georgiy Lebedev, Hamish Nicholson, Musa Ünal, Sanidhya Kashyap and Anastasia Ailamaki, EPFL

GPU-Accelerated Stochastic Gradient Descent for Scalable Operator Placement in Geo-Distributed Streaming Systems

Tristan Joel Terhaag, Technische Universität Berlin, Xenofon Chatziliadis, Technische Universität Berlin, Eleni Tzirita Zacharatou, Hasso Plattner Institute, University of Potsdam, and Volker Markl, Technische Universität Berlin

A Data Aggregation Visualization System supported by Processing-in-Memory

Junyoung Kim, Madhulika Balakumar and Kenneth Ross, Columbia University

A Hot Take on the Intel Analytics Accelerator for Database Management Systems

Christos Laspias, Andrew Pavlo and Jignesh Patel, Carnegie Mellon University

RISC-V Meets RDBMS: An Experimental Study of Database Performance on an Open Instruction Set Architecture

Yizhe Zhang, Zhengyi Yang, Bocheng Han, University of New South Wales, Haoran Ning, Macquarie University, Xin Cao, John Shepherd, University of New South Wales, and Guanfeng Liu, Macquarie University

CXL-Bench: Benchmarking Shared CXL Memory Access

Marcel Weisgut, Hasso Plattner Institute, University of Potsdam, Daniel Ritter, SAP, Florian Schmeller, Hasso Plattner Institute, University of Potsdam, Pınar Tözün, IT University of Copenhagen, and Tilmann Rabl, Hasso Plattner Institute, University of Potsdam.

Micro-architectural Exploration of the Relational Memory Engine (RME) in RISC-V and FireSim

Cole Strickler, University of Kansas, Ju Hyoung Mun, Brandeis University, Connor Sullivan, University of Kansas, Denis Hoornaert, Technical University of Munich, Renato Mancuso, Manos Athanassoulis, Boston University, and Heechul Yun, University of Kansas

AIDB

6th Applied AI for Database Systems and Applications

TailorSQL: A NL2SQL System Tailored for Your Query Workload

Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska

AutoDebugger: Efficient Root Cause Analysis for Anomaly Jobs

Fathelrahman Ali, Yiwen Zhu, Lie Jiang, Zhen Li, Manting Li, Kun Huang, Lijing Lin, Long Tian, Xiaolei Liu, Subru Krishnan

Grounding LLMs for Database Exploration: Intent Scoping and Paraphrasing for Robust NL2SQL

Catalina Dragusin, Katsiaryna Mirylenka, Christoph Miksovic, Michael Glass, Nahuel Defosse, Paolo Scotton, Thomas Gschwind

Instance-Optimized String Fingerprints

Mihail Stoian, Johannes Thürauf, Andreas Zimmerer, Alexander van Renen, Andreas Kipf

JOB-Complex: A Challenging Benchmark for Traditional & Learned Query Optimization

Johannes Wehrstein, Timo Eckmann, Roman Heinrich, Carsten Binnig

Bootstrapping Learned Cost Models with Synthetic SQL Queries

Michael Nidd, Christoph Miksovic, Thomas Gschwind, Francesco Fusco, Andrea Giovannini, Ioana Giurgiu

Exploring Wavelet Trees as Space-Efficient Physical-to-Sorted Mapping for Learned Indexes

Anwesha Saha, Aneesh Raman, Ryan Marcus, Manos Athanassoulis

Learning to Accelerate: Tuning Data Transfer Parameters

Benedikt Didrich, Haralampos Gavriilidis, Vasilis Gkolemis, Matthias Boehm, Volker Markl

Research Challenges in Relational Database Management Systems for LLM Queries

Kerem Akillioglu, Anurag Chakraborty, Sairaj Voruganti, M. Tamer Özsu

CDMS

3rd International Workshop on Composable Data Management Systems

A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads

András Strausz, Niels Pardon, Ioana Giurgiu

Eudoxia: a FaaS scheduling simulator for the composable lakehouse

Tapan Srivastava, Jacopo Tagliabue, Ciro Greco

Composability and Interoperability for Federated Data Systems

Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Jonathan Gerloff, Benedikt Didrich, Midhun Kaippillil Venugopalan, Kaustubh Beedkar, Matthias Boehm, Volker Markl

DAG lakehouse planning with an ephemeral and embedded graph database

Luca Bigon, Jacopo Tagliabue, Semih Salihoğlu

GranPipe: Composable Hierarchical Pipelines for Near-Data Processing

Johannes Pietrzyk, Wolfgang Lehner, Dirk Habich, Philippe Bonnet

LanceDB - Embracing Composability in the Storage Layer

Weston Pace, Chang She, Lei Xu, Will Jones, Rob Meng, Yang Cen

DaSH

5th Data Science with Human-in-the-Loop

Reducing Human Effort in Evaluating Small and Medium Language Models as Students and as Teachers

Oleh Prostakov, Viacheslav Hodlevskyi, Nassim Bouarour, Adam Sanchez-Ayte, Noha Ibrahim, Sihem Amer-Yahia

Human + AI: Large scale Data Curation For Multilingual Guardrails

Harshit Rajgarhia, Abhishek Mukherji, Fen Yik, Dominika Borek, Nicole Warren, Prithiviraj Pradeep

HierTOD: A Task-Oriented Dialogue System Driven by Hierarchical Goals

Lingbo Mo, Shun Jiang, Akash V Maharaj, J. Bernard Hishamunda, Yunyao Li

Adobe Summit Concierge Evaluation with human in the loop

Yiru Chen, Sally Fang, Sai Sree Harsha, Dan Luo, Vaishnavi Muppala, Fei Wu, Shun Jiang, Kun Qian, Yunyao Li

DATAI

2nd International Workshop on Data-Centric AI

SQL-ML: A SQL-Centric Framework for Building Efficient Feature Store

Ahmad Ghazal, Hanumath Maduri and Pekka Kostamaa

A Low Latency Cache for Cloud RDBMs

Guohai Zhang, Xin Tang, Qingchen Chang, Huanchen Zhang, Kai Hwang, Yuesen Li, Runhuai Huang, Teng Wang, Wusheng Zhang, Ming Zhang, Qingchun Chen, Xiaodong Hou and Qian Wang

The Case for Intent-Based Query Rewriting

Gianna Lisa Nicolai, Patrick Hansert and Sebastian Michel

TabAgent: A Multi-Agent Table Extraction Framework for Unstructured Documents

Jingfei Wu, Chaoyuan Shen, Qiyan Deng, Yuping Wang, Jiajun Li, Yuhao Deng and Minghe Yu

Lightweight Pipelines: Good Enough is Sometimes Better

Camilla Sancricca and Cinzia Cappiello

CleanAgent: Automating Data Standardization with LLM-based Agents

Danrui Qi, Zhengjie Miao and Jiannan Wang

DeepSearch: LLM-powered Data Acquisition for Machine Learning

Kaiyu Li, Zhongxin Hu, Yuxin Gao and Yuyang Wu

Detecting and Cleaning Errors in Personal Contact Information with Large Language Models

Anna-Christina Glock, Christine Dominka-Kiss, Philipp Korom and Lisa Ehrlinger

GuideAI

2nd Governance, Understanding and Integration of Data for Effective and Responsible AI

Model Slicing for Responsible AI

Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

Towards Identifying Intent of Data Errors

Mohamed Ahmed Abdelmaksoud Mohamed, Konrad Rieck, Ziawasch Abedjan

ExperimentLens: Interactive Visual Analytics and Explainability for ML Experiment Management

Stavros Maroulis, Vassilis Stamatopoulos, Panagiotis Gidarakos, Konstantinos Tsopelas, Nikolas Masouras, Konstantinos Kozanis, Nikolas Theologitis, George Papastefanatos, Giorgos Giannopoulos, Erik Nilsson

LightUL: An Efficient Recommendation Unlearning Framework

Wentao Ning, Haorui He, Reynold Cheng, Nur Al Hasan Haldar, Ben Kao, Nan Huo, Bo Tang, Yupeng Li

LS-NSL

1st Workshop on New Ideas for Large-Scale Neurosymbolic Learning Systems

Modular Neuro-Symbolic Knowledge Graph Completion

Abelardo Carlos Martinez Lorenzo, Alexander Perfilyev, Volker Markl, Martha Clokie, Thomas Sicheritz-Pontén, Zoi Kaoudi

Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification

Mykhailo Buleshnyi, Anna Polova, Zsolt Zombori, Michael Benedikt

Graph Consistency Rule Mining with LLMs: an Exploratory Study

Hoa Le Thi, Angela Bonifati, Andrea Mauri

LSGDA

4th International Workshop on Large-Scale Graph Data Analytics

Top-r Influential Community Search in Bipartite Graphs

Yanxin Zhang, Zhengyu Hua, Long Yuan

GAL: Topology-Aware Serialization for Graph Traversals

Zeynep Korkmaz, Tamer Özsu, Khuzaima Daudjee

EnGraph: Ensemble-Based Augmentation for Graph Anomaly Detection

Andrew Shields, Robert Sheehy, Pat Doody

Growing Up HAL: Historic and Property Graph Queries

Muhammad Khan, Ioana Manolescu, Angelos-Christos Anadiotis

Efficient Betweenness Maximization in Temporal Networks

Xijuan Liu, Kejia Xu, Lele Zhang, Haiyang Hu, Ying Zhang

Single-Source Regular Path Querying in Terms of Linear Algebra

Semyon Grigorev, Georgiy Belyanin, Rodion Suvorov

QDB

14th International Workshop on Quality in Databases

14th International Workshop on Quality in Databases: Preface

Lisa Ehrlinger, Lorena Etcheverry, and Hazar Harmouch

Dynamic Knowledge Graph-based Measurement of Data Quality

Johannes Schrott, Rainer Meindl, Christian Lettner, Stefan Hammer, and Magdalena Leitner

Label Flipping For Group Fairness

Shashank Thandri and Romila Pradhan

TaDA

3rd International Workshop on Tabular Data Analysis

Improving Column Type Annotation Using Large Language Models

Amir Babamahmoudi, Davood Rafiei, Mario A. Nascimento

A Vision for SQL-Based Relational Deep Learning

Fahim Shahriar Khan, Ashraf Aboulnaga

Optimizing Source Selection for Tuple-Value Discovery

Ahmad Fares, Georgia Troullinou, Silviu Maniu, Sihem Amer-Yahia

Universal Embeddings of Tabular Data

Astrid Franz, Frederik Hoppe, Marianne Michaelis, Udo Göbel

SemForest: Semantic-Aware Ontology Generation with Foundation Models

Guohui Guan, Sachin Konan, Larry Rudolph, Chang Ge

Query Plan Generation for Table Question Answering

Ivan A. Poddubnyi, Nikita O. Dorodnykh

Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models

Panagiotis Koletsis, Christos Panagiotopoulos, Georgios Th. Papadopoulos, Vasilis Efthymiou

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation

Satyananda Kashyap, Sola Shirai, Nandana Mihindukulasooriya, Horst Samulowitz

Table Header Recognition Based on Large Language Models

Ilia I. Okhotin, Nikita O. Dorodnykh

TOPJoin: A Context-Aware Multi-Criteria Approach for Joinable Column Search

Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Haritha Ananthakrishnan, Oktie Hassanzadeh, Horst Samulowitz, and Kavitha Srinivas

Towards Fine-Grained Extraction of Scientific Claims from Heterogeneous Tables Using Large Language Models

Daniele Bertillo, Laks V.S. Lakshmanan, Paolo Merialdo, Divesh Srivastava

DEC

3rd Data EConomy Workshop

An Interpretable Market-based Data Price Prediction Tool

Santiago Andrés Azcoitia and Alicia Cabrero Jiménez

Mixture-of-Experts based Model Market

Yizhou Ma, Xikun Jiang, Wenbo Wu, Zhuoqin Yang, and Luis-Daniel Ibáñez

MINiDM: Multi-Issue Negotiation in Decentralised Data Marketplaces.

Soulmaz Gheisari, Jaime Osvaldo Salas, Semih Yumusak, and George Konstantinidis

LLMDap: LLM-based Data Profiling and Sharing

Shanshan Jiang, Sondre Sørbø, Phil Tinn, Shang Ferheng Karim, and Dumitru Roman

LLM+Graph

2nd International Workshop on Data Management Opportunities in Bringing LLMs with Graph Data

Scalable Graph-based Retrieval-Augmented Generation via Locality-Sensitive Hashing

Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qingtian Guo, Wensheng Luo, Xiaofang Zhou

LLM-Hype: A Targeted Evaluation Framework for Hypernym-Hyponym Identification in Large Language Models

Qiu Ji, Pengfei Zhu, Haolei Zhu, Yang Sheng, Guilin Qi, Lianlong Wu, Kang Xu, Yuan Meng

Graph-Enhanced Large Language Models for Spatial Search [Vision]

Nicole Schneider, Kent O'Sullivan, Hanan Samet

Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

Nandana Mihindukulasooriya, Niharika DSouza, Faisal Chowdhury, Horst Samulowitz

LLM+Spatial

1st Large Language Models for Spatial-rich Data Management

PhD

Author: Jiuqi Wei

Created: 2025-07-28 Mon 11:58