Proceedings of Workshops at the 51st International Conference on Very Large Data Bases (VLDB 2025)
VLDBW 2025
VLDB 2025 Workshop Chairs
Norman Paton, John Paparrizos
VLDB 2025 Workshop Proceeding Chair
Jiuqi Wei
Accepted Workshops
ADMS: 16th Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architecture
- Workshop Chairs: Rajesh Bordawekar, Tirthankar Lahiri
AIDB: 6th Applied AI for Database Systems and Applications
- Workshop Chairs: Umar Farooq Minhas, Subru Krishnan, Thaleia Doudali
CDMS: 3rd International Workshop on Composable Data Management Systems
- Workshop Chairs: Satyanarayana R Valluri, Mohamed Zait
DaSH: 5th Data Science with Human-in-the-Loop
- Workshop Chairs: Eduard Dragut, Yunyao Li, Lucian Popa, Kun Qian, Sherry Tongshuang Wu
DATAI: 2nd International Workshop on Data-Centric AI
- Workshop Chairs: Hongzhi Wang, Nan Tang
GuideAI: 2nd Governance, Understanding and Integration of Data for Effective and Responsible AI
- Workshop Chairs: Sainyam Galhotra, Babak Salimi
LS-NSL: 1st Workshop on New Ideas for Large-Scale Neurosymbolic Learning Systems
- Workshop Chairs: Efthymia Tsamoura, Pablo Barceló, Jacopo Urbani
LSGDA: 4th International Workshop on Large-Scale Graph Data Analytics
- Workshop Chairs: Wenjie Zhang, Ying Zhang, Zhengyi Yang, Dong Wen, Wentao Li
QDB: 14th International Workshop on Quality in Databases
- Workshop Chairs: Lisa Ehrlinger, Sourav S Bhowmick, Lorena Etcheverry, Hazar Harmouch
TaDA: 3rd International Workshop on Tabular Data Analysis
- Workshop Chairs: Vasilis Efthymiou, Oktie Hassanzadeh, Sainyam Galhotra, Ernesto Jiménez-Ruiz, Chuan Lei
DEC: 3rd Data EConomy Workshop
- Workshop Chairs: Santiago Andrés Azcoitia, George Konstantinidis
LLM+Graph: 2nd International Workshop on Data Management Opportunities in Bringing LLMs with Graph Data
- Workshop Chairs: Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan
LLM+Spatial: 1st Large Language Models for Spatial-rich Data Management
- Workshop Chairs: Jianqiu Xu, Cheng Long, Bernhard Seeger, Yongxin Tong
PhD: VLDB 2025 PhD Workshop
- Workshop Chairs: Sonia Bergamaschi, Raul Castro Fernandez
ADMS
16th Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architecture
High Throughput GPU-Accelerated FSST String Compression
Tim Anema, Delft University of Technology, Joost Hoozemans, Voltron Data, Zaid Al-Ars Delft University of Technology, and H. Peter Hofstee, IBM.
Demystifying CXL Memory Bandwidth Expansion for Analytical Workloads
Georgiy Lebedev, Hamish Nicholson, Musa Ünal, Sanidhya Kashyap and Anastasia Ailamaki, EPFL
GPU-Accelerated Stochastic Gradient Descent for Scalable Operator Placement in Geo-Distributed Streaming Systems
Tristan Joel Terhaag, Technische Universität Berlin, Xenofon Chatziliadis, Technische Universität Berlin, Eleni Tzirita Zacharatou, Hasso Plattner Institute, University of Potsdam, and Volker Markl, Technische Universität Berlin
A Data Aggregation Visualization System supported by Processing-in-Memory
Junyoung Kim, Madhulika Balakumar and Kenneth Ross, Columbia University
A Hot Take on the Intel Analytics Accelerator for Database Management Systems
Christos Laspias, Andrew Pavlo and Jignesh Patel, Carnegie Mellon University
RISC-V Meets RDBMS: An Experimental Study of Database Performance on an Open Instruction Set Architecture
Yizhe Zhang, Zhengyi Yang, Bocheng Han, University of New South Wales, Haoran Ning, Macquarie University, Xin Cao, John Shepherd, University of New South Wales, and Guanfeng Liu, Macquarie University
CXL-Bench: Benchmarking Shared CXL Memory Access
Marcel Weisgut, Hasso Plattner Institute, University of Potsdam, Daniel Ritter, SAP, Florian Schmeller, Hasso Plattner Institute, University of Potsdam, Pınar Tözün, IT University of Copenhagen, and Tilmann Rabl, Hasso Plattner Institute, University of Potsdam.
Micro-architectural Exploration of the Relational Memory Engine (RME) in RISC-V and FireSim
Cole Strickler, University of Kansas, Ju Hyoung Mun, Brandeis University, Connor Sullivan, University of Kansas, Denis Hoornaert, Technical University of Munich, Renato Mancuso, Manos Athanassoulis, Boston University, and Heechul Yun, University of Kansas
AIDB
6th Applied AI for Database Systems and Applications
Inferring Missing Data Lineage Links from Schema Metadata Using Transformer-Based Models
Maciej Brzeski, Adam Roman
TailorSQL: A NL2SQL System Tailored for Your Query Workload
Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska
Learning What Matters: Automated Feature Selection for Learned Cost Model in Parallel Stream Processing
Pratyush Agnihotri, Carsten Binnig, Manisha Luthra
AutoDebugger: Efficient Root Cause Analysis for Anomaly Jobs
Fathelrahman Ali, Yiwen Zhu, Lie Jiang, Zhen Li, Manting Li, Kun Huang, Lijing Lin, Long Tian, Xiaolei Liu, Subru Krishnan
Grounding LLMs for Database Exploration: Intent Scoping and Paraphrasing for Robust NL2SQL
Catalina Dragusin, Katsiaryna Mirylenka, Christoph Miksovic, Michael Glass, Nahuel Defosse, Paolo Scotton, Thomas Gschwind
Instance-Optimized String Fingerprints
Mihail Stoian, Johannes Thürauf, Andreas Zimmerer, Alexander van Renen, Andreas Kipf
MageSQL: Enhancing In-context Learning for Text-to-SQL Applications with Large Language Models
Chen Shen, Jin Wang, Sajjadur Rahman, Eser Kandogan
JOB-Complex: A Challenging Benchmark for Traditional & Learned Query Optimization
Johannes Wehrstein, Timo Eckmann, Roman Heinrich, Carsten Binnig
Bootstrapping Learned Cost Models with Synthetic SQL Queries
Michael Nidd, Christoph Miksovic, Thomas Gschwind, Francesco Fusco, Andrea Giovannini, Ioana Giurgiu
Exploring Wavelet Trees as Space-Efficient Physical-to-Sorted Mapping for Learned Indexes
Anwesha Saha, Aneesh Raman, Ryan Marcus, Manos Athanassoulis
Learning to Accelerate: Tuning Data Transfer Parameters
Benedikt Didrich, Haralampos Gavriilidis, Vasilis Gkolemis, Matthias Boehm, Volker Markl
Research Challenges in Relational Database Management Systems for LLM Queries
Kerem Akillioglu, Anurag Chakraborty, Sairaj Voruganti, M. Tamer Özsu
CDMS
3rd International Workshop on Composable Data Management Systems
Speedrunning a lakehouse: a composable FaaS over object storage
Jacopo Tagliabue
Theseus a Composable distributed execution runtime: Performance across GPUs, Networks, and Storage
Felipe Aramburu
A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads
András Strausz, Niels Pardon, Ioana Giurgiu
Rethinking Pluggable Federated Query Optimization: From Laptops to Data Warehouses
Victor Giannakouris, Immanuel Trummer
Eudoxia: a FaaS scheduling simulator for the composable lakehouse
Tapan Srivastava, Jacopo Tagliabue, Ciro Greco
Composability and Interoperability for Federated Data Systems
Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Jonathan Gerloff, Benedikt Didrich, Midhun Kaippillil Venugopalan, Kaustubh Beedkar, Matthias Boehm, Volker Markl
Composing XGBoost UDFs with Arrow Flight
Hussain Sultan
Building IBM watsonx.data from Composable Parts
Aditi Pandit
The Deconstructed Warehouse: An Ephemeral Query Engine Design for Apache Iceberg
Ryan Curtin, Jacopo Tagliabue
DAG lakehouse planning with an ephemeral and embedded graph database
Luca Bigon, Jacopo Tagliabue, Semih Salihoğlu
GranPipe: Composable Hierarchical Pipelines for Near-Data Processing
Johannes Pietrzyk, Wolfgang Lehner, Dirk Habich, Philippe Bonnet
LanceDB - Embracing Composability in the Storage Layer
Weston Pace, Chang She, Lei Xu, Will Jones, Rob Meng, Yang Cen
DaSH
5th Data Science with Human-in-the-Loop
Reducing Human Effort in Evaluating Small and Medium Language Models as Students and as Teachers
Oleh Prostakov, Viacheslav Hodlevskyi, Nassim Bouarour, Adam Sanchez-Ayte, Noha Ibrahim, Sihem Amer-Yahia
DeepGit: Promoting Exploration and Discovery of Research Software with Human-Curated Graphs
Yilin Xia, Shin-Rong Tsai, Matthew Turk
Human + AI: Large scale Data Curation For Multilingual Guardrails
Harshit Rajgarhia, Abhishek Mukherji, Fen Yik, Dominika Borek, Nicole Warren, Prithiviraj Pradeep
HierTOD: A Task-Oriented Dialogue System Driven by Hierarchical Goals
Lingbo Mo, Shun Jiang, Akash V Maharaj, J. Bernard Hishamunda, Yunyao Li
Adobe Summit Concierge Evaluation with human in the loop
Yiru Chen, Sally Fang, Sai Sree Harsha, Dan Luo, Vaishnavi Muppala, Fei Wu, Shun Jiang, Kun Qian, Yunyao Li
DATAI
2nd International Workshop on Data-Centric AI
SQL-ML: A SQL-Centric Framework for Building Efficient Feature Store
Ahmad Ghazal, Hanumath Maduri and Pekka Kostamaa
A Low Latency Cache for Cloud RDBMs
Guohai Zhang, Xin Tang, Qingchen Chang, Huanchen Zhang, Kai Hwang, Yuesen Li, Runhuai Huang, Teng Wang, Wusheng Zhang, Ming Zhang, Qingchun Chen, Xiaodong Hou and Qian Wang
The Case for Intent-Based Query Rewriting
Gianna Lisa Nicolai, Patrick Hansert and Sebastian Michel
TabAgent: A Multi-Agent Table Extraction Framework for Unstructured Documents
Jingfei Wu, Chaoyuan Shen, Qiyan Deng, Yuping Wang, Jiajun Li, Yuhao Deng and Minghe Yu
Lightweight Pipelines: Good Enough is Sometimes Better
Camilla Sancricca and Cinzia Cappiello
CleanAgent: Automating Data Standardization with LLM-based Agents
Danrui Qi, Zhengjie Miao and Jiannan Wang
SoAgent: A Real-world Data Empowered Agent Pool to Facilitate LLM-Driven Generative Social Simulation
Na Ta, Kaiyu Li, Yushu Zhou and Yuhan Liu
DeepSearch: LLM-powered Data Acquisition for Machine Learning
Kaiyu Li, Zhongxin Hu, Yuxin Gao and Yuyang Wu
Detecting and Cleaning Errors in Personal Contact Information with Large Language Models
Anna-Christina Glock, Christine Dominka-Kiss, Philipp Korom and Lisa Ehrlinger
GuideAI
2nd Governance, Understanding and Integration of Data for Effective and Responsible AI
Model Slicing for Responsible AI
Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta
Towards Identifying Intent of Data Errors
Mohamed Ahmed Abdelmaksoud Mohamed, Konrad Rieck, Ziawasch Abedjan
ExperimentLens: Interactive Visual Analytics and Explainability for ML Experiment Management
Stavros Maroulis, Vassilis Stamatopoulos, Panagiotis Gidarakos, Konstantinos Tsopelas, Nikolas Masouras, Konstantinos Kozanis, Nikolas Theologitis, George Papastefanatos, Giorgos Giannopoulos, Erik Nilsson
LightUL: An Efficient Recommendation Unlearning Framework
Wentao Ning, Haorui He, Reynold Cheng, Nur Al Hasan Haldar, Ben Kao, Nan Huo, Bo Tang, Yupeng Li
DBMS-LLM Integration Strategies in Industrial and Business Applications: Current Status and Future Challenges
Zhengtong Yan, Gongsheng Yuan, Qingsong Guo, Jiaheng Lu
LS-NSL
1st Workshop on New Ideas for Large-Scale Neurosymbolic Learning Systems
Modular Neuro-Symbolic Knowledge Graph Completion
Abelardo Carlos Martinez Lorenzo, Alexander Perfilyev, Volker Markl, Martha Clokie, Thomas Sicheritz-Pontén, Zoi Kaoudi
Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification
Mykhailo Buleshnyi, Anna Polova, Zsolt Zombori, Michael Benedikt
Graph Consistency Rule Mining with LLMs: an Exploratory Study
Hoa Le Thi, Angela Bonifati, Andrea Mauri
LSGDA
4th International Workshop on Large-Scale Graph Data Analytics
Report on the 4th International Workshop on Large-Scale Graph Data Analytics (LSGDA 2025)
Zhengyi Yang, Dong Wen, Wentao Li
Top-r Influential Community Search in Bipartite Graphs
Yanxin Zhang, Zhengyu Hua, Long Yuan
Hyracks Unchained: Efficient Recursion for Navigational Queries in Apache AsterixDB
Glenn Galvizo, Michael Carey
GAL: Topology-Aware Serialization for Graph Traversals
Zeynep Korkmaz, Tamer Özsu, Khuzaima Daudjee
Experiment & Benchmark Paper: To What Extent Does Quality Matter? The Impact of Graph Data Quality on GNN Model Performance
Jana Vatter, Maurice L. Rochau, Ruben Mayer, Hans-Arno Jacobsen
EnGraph: Ensemble-Based Augmentation for Graph Anomaly Detection
Andrew Shields, Robert Sheehy, Pat Doody
Semantic Embedding for Enterprise Clustering: A Systematic and Scalable Approach Using SentenceTransformers
Yigong Xiao, Xianzhi Lei, Kecheng Wang, Changan Zhou, Niannian Huang
Shape-Aware, Scale-Agnostic Representation of Dynamic DAGs
Jennifer Neumann, Peter M. Fischer
Vision Paper: Improving the Accessibility of Port Operations in Supply Chain Management using Graph Data Analysis
Mert Ayas, Frank Laarmann, Leif Meier, Katja Zeume
Growing Up HAL: Historic and Property Graph Queries
Muhammad Khan, Ioana Manolescu, Angelos-Christos Anadiotis
Efficient Betweenness Maximization in Temporal Networks
Xijuan Liu, Kejia Xu, Lele Zhang, Haiyang Hu, Ying Zhang
Single-Source Regular Path Querying in Terms of Linear Algebra
Semyon Grigorev, Georgiy Belyanin, Rodion Suvorov
QDB
14th International Workshop on Quality in Databases
14th International Workshop on Quality in Databases: Preface
Lisa Ehrlinger, Lorena Etcheverry, and Hazar Harmouch
Out in the Wild: Investigating the Impact of Imperfect Data on a Tabular Foundation Model
Vasileios Papastergios and Anastasios Gounaris
Exploring Privacy-Preserving Record Linkage: A Holistic Framework for Dataset Generation and Detailed Result Analysis
Florens Rohde, Victor Christen, and Erhard Rahm
Dynamic Knowledge Graph-based Measurement of Data Quality
Johannes Schrott, Rainer Meindl, Christian Lettner, Stefan Hammer, and Magdalena Leitner
Evolving Gracefully: Building Robust and Self-Adaptive Data Cleaning Pipelines for Schema Evolution and Uncertainty
Kevin Kramer, Valerie Restat, and Uta Störl
Label Flipping For Group Fairness
Shashank Thandri and Romila Pradhan
PBE Meets LLM: When Few Examples Aren’t Few-Shot Enough
Shuning Zhang and Yongjoo Park
TaDA
3rd International Workshop on Tabular Data Analysis
Improving Column Type Annotation Using Large Language Models
Amir Babamahmoudi, Davood Rafiei, Mario A. Nascimento
A Vision for SQL-Based Relational Deep Learning
Fahim Shahriar Khan, Ashraf Aboulnaga
Evaluating SQL Selection/Projection over Table Embeddings
Mariam Mellouli, Paolo Papotti
From Features to Structure: Task-Aware Graph Construction for Relational and Tabular Learning with GNNs
Tamara Cucumides, Floris Geerts
Optimizing Source Selection for Tuple-Value Discovery
Ahmad Fares, Georgia Troullinou, Silviu Maniu, Sihem Amer-Yahia
Universal Embeddings of Tabular Data
Astrid Franz, Frederik Hoppe, Marianne Michaelis, Udo Göbel
SemForest: Semantic-Aware Ontology Generation with Foundation Models
Guohui Guan, Sachin Konan, Larry Rudolph, Chang Ge
Query Plan Generation for Table Question Answering
Ivan A. Poddubnyi, Nikita O. Dorodnykh
Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models
Panagiotis Koletsis, Christos Panagiotopoulos, Georgios Th. Papadopoulos, Vasilis Efthymiou
StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation
Satyananda Kashyap, Sola Shirai, Nandana Mihindukulasooriya, Horst Samulowitz
Table Header Recognition Based on Large Language Models
Ilia I. Okhotin, Nikita O. Dorodnykh
TOPJoin: A Context-Aware Multi-Criteria Approach for Joinable Column Search
Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Haritha Ananthakrishnan, Oktie Hassanzadeh, Horst Samulowitz, and Kavitha Srinivas
Towards Fine-Grained Extraction of Scientific Claims from Heterogeneous Tables Using Large Language Models
Daniele Bertillo, Laks V.S. Lakshmanan, Paolo Merialdo, Divesh Srivastava
DEC
3rd Data EConomy Workshop
An Interpretable Market-based Data Price Prediction Tool
Santiago Andrés Azcoitia and Alicia Cabrero Jiménez
Mixture-of-Experts based Model Market
Yizhou Ma, Xikun Jiang, Wenbo Wu, Zhuoqin Yang, and Luis-Daniel Ibáñez
UxV-DPN: Utility-vs-Value Data Pricing and Negotiation Mechanism in Machine Learning Data Marketplace.
Hajar Baghcheband, Carlos Soares, and Luis Paulo Reis
MINiDM: Multi-Issue Negotiation in Decentralised Data Marketplaces.
Soulmaz Gheisari, Jaime Osvaldo Salas, Semih Yumusak, and George Konstantinidis
LLMDap: LLM-based Data Profiling and Sharing
Shanshan Jiang, Sondre Sørbø, Phil Tinn, Shang Ferheng Karim, and Dumitru Roman
LLM+Graph
2nd International Workshop on Data Management Opportunities in Bringing LLMs with Graph Data
LLM+Graph: 2nd International Workshop on Data Management Opportunities in Bringing LLMs with Graph Data
Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan
LLM-assisted Construction of the United States Legislative Graph
Francesco Cambria, Andrea Colombo
Scalable Graph-based Retrieval-Augmented Generation via Locality-Sensitive Hashing
Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qingtian Guo, Wensheng Luo, Xiaofang Zhou
LLM-Hype: A Targeted Evaluation Framework for Hypernym-Hyponym Identification in Large Language Models
Qiu Ji, Pengfei Zhu, Haolei Zhu, Yang Sheng, Guilin Qi, Lianlong Wu, Kang Xu, Yuan Meng
Graph-Enhanced Large Language Models for Spatial Search [Vision]
Nicole Schneider, Kent O'Sullivan, Hanan Samet
xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models
Gustavo Publio, Jose Emilio Labra Gayo
Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study
Nandana Mihindukulasooriya, Niharika DSouza, Faisal Chowdhury, Horst Samulowitz
Towards the Next Generation of Agent Systems: From RAG to Agentic AI [Vision]
Yingli Zhou, Shu Wang
LLM+Spatial
1st Large Language Models for Spatial-rich Data Management
NALMOBench: Towards Benchmarking Natural Language Interfaces for Moving Objects Databases
Xieyang Wang, Weijia Yi, Mengyi Liu, Chenchen Zong
PhD
VLDB 2025 PhD Workshop
Algorithm Support in a Graph Database, Done Right
Daan de Graaf
SwellDB: GenAI-Native Query Processing via On-the-Fly Table Generation
Victor Giannakouris
Entropy-Based Anomaly Detection in Evolving Graph Streams
Satoshi Kayano
Toward Interpretable Methods for Time Series Analytics
Félix Chavelli
Running Functions on Pooled Data Without Leakage
Christopher Zhu
Fast or Accurate? Rethinking Time Series Anomaly Detection
Emmanouil Sylligardos
Modeling and Operationalizing Data Ecosystems
Soo-Yon Kim
Optimizing Data Systems for LLM Workloads
Kerem Akillioglu