Objectives

Data Analysis is described as the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Performing such tasks over large and heterogeneous collections of tabular data, as found in enterprise data lakes and on the Web, is extremely challenging and an attractive research topic in data management, AI, and related communities. The goal of this workshop is to bring together researchers and practitioners in these diverse communities that work on addressing the fundamental research challenges of tabular data analysis and building automated solutions in this space.

We aim to provide a forum for: a) exchange of ideas between two communities: 1) an active community of data management researchers working on data integration and schema and data matching problems over tabular data, and 2) a vibrant community of researchers in AI and Semantic Web communities working on the core challenge of matching tabular data to Knowledge Graphs as a part of the ISWC SemTab Challenges. b) presentation of late-breaking results related to several emerging research areas such as table representation learning and its applications, use of large language models (LLMs) for tabular data analysis, andautomation of data science pipelines, and automation of data science pipelines that rely on tabular data. c) discussion of real-world challenges related to implementing industrial-scale tabular data anaylsis pipelines, and data lakes and data lakehouse solutions.

Call For Papers

Audience: Our workshop encourages participation from researchers in data management, AI, and Semantic Web communities working on a wide range of problems relevant to tabular data analysis. We hope that this will constitute a single reference point for the researchers and practitioners working in that area and help form new collaborations. We also aim to provide a venue for researchers from industry and practitioners relying on various tabular data analysis tasks to present use cases and discuss their needs in addressing real-world problems and large-scale solutions.

Topics of Interest include but are not limited to:

Semantic Table Annotation
Automated Tabular Data Understanding
Using Large Language Models (LLMs) for Tabular Data Analysis
Exploratory Data Analysis over Tabular Data
Table Search in Data Lakes
Tabular Data Discovery
Metadata Management for Tabular Data Analysis
Data Augmentation with Tabular Data
Integration and Matching of Tabular Data
Knowledge Graph Construction and Completion with Tabular Data
Automated Discovery of ML Features from Tabular Data
ML Model Development with Tabular Data
Visualization and Interfaces for Tabular Data Analysis
Data Wrangling for Tabular Data Analysis
Deep Learning and Representation Learning for Tabular Data Analysis
Extraction and Analysis of Tabular Data from (HTML/PDF) Documents and Images
Analysis of Tabular Data on the Web (Web Tables)
Practical Applications of Tabular Data Analysis
Benchmarking and Evaluation Frameworks for Tabular Data Analysis

Submissions

The workshop includes a research track and a systems track. Contributions to the research track can take the form of technical papers addressing various aspects of tabular data analysis. The systems track will include presentations from participants of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). Contributions can also take the form of posters or statements of interest. Full Research Papers should be up to 8 pages (excluding references). Short Research Papers should be up to 4 pages (excluding references). Extended Abstract Papers (for systems track and posters) should be up to 2 pages (excluding references). References do not count towards the page limits mentioned above.

Submission site: https://cmt3.research.microsoft.com/TaDA2026

Submissions should follow the format outlined in the provided zipped LaTeX proceedings directory.

TaDA uses non-anonymous submissions, which means that the authors must list their names and affiliations as part of their submission. Reviewers, on the other hand, are anonymous to the authors. Authors of accepted papers will have the option to include their papers in the VLDB workshop proceedings. At least one co-author is expected to register for the VLDB 2026 conference and present the paper in-person. Please visit the VLDB 2026 website for more information on registration.

Important Dates (Tentative)

Submission deadline: ~~May 18, 2026~~ May 20, 2026
Notification of acceptance: ~~June 15, 2026~~ June 17, 2026
Camera-ready copy due: ~~June 29, 2026~~ July 7, 2026
Workshop Day: September 4, 2026 (TBA)

All Times are Anywhere on Earth (AoE).

Program

The program will be announced closer to the workshop date.

Accepted Papers

Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei He, Federico Reyes Matthias Fey, Jure Leskovec. Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
Yibo Wang, Riteng Zhang, Bharat Bhargava, Chunwei Liu. TabClean: Scalable Tabular Data Cleaning via Reusable LLM-Synthesized Programs (Short Paper)
Soroush Omidvartehrani, Davood Rafiel. LDI: Localized Data Imputation for Text-Rich Tables
Sola Shirai, Debarun Bhattacharjya, Oktie Hassanzadeh, Gaetano Rosiello. Predicting Table Joinability in Data Lakes using a Metadata Knowledge Graph
Nicholas Pulsone, Roee Shraga, Gregory Goren. Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching (Short Paper)
Jiahan Cao, Rachel Pottinger. Representation, Retrieval, and Decision Space: A Case Study on Diagnosing LLM Failures in Column Type Annotation
Sebastian Bugedo, Stijn Vansummeren. Hyperdimensional Computing for Structured Query on Tabula Data Embeddings
Ayeen Poostforoushan, Liane Vogel, Carsten Binnig. TEmBed-T: A Multi-Dimensional Benchmark for Table-Level Embeddings (Short Paper)
Talha Tahmid, Humera Sabir, Ashraf Aboulnaga. From Transport to Grounding: Designing MCP Connectors for Reliable SPARQL Query by LLMs
Niharika D'Souza, Liane Vogel Kavitha Srinivas, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz. Can a Single Tabular Embedding Model Service Different Tasks?
Yuval Lubarsky, Dean Light, Boaz Berger, Shunit Agmon, Benny Kimelfeld. Incorporating Deep Learning Design in Database Queries
Daniele Bertillo, Giorgio Melchiorri, Paolo Merialdo. PaperUnPlot: Benchmarking Chart-to-Table in the Wild
Inwon Kang, Kavitha Srinivas, Sola Shirai, Nandana Mihindukulasooriya, Niharika D'Souza Horst Samulowitz, Oshani Seneviratne. Towards Budget-Aware Dense Retrieval for Tables: Trade-offs, Alternatives and Future Directions
Michael Zuo, Inwon Kang, Oshani Seneviratne, Stacy Patterson. PFN-Syn: Generating Synthetic Tabular Data with Prior-Data Fitted Networks (Poster)
Zhengyuan Dong, Renee Miller. Diversed Model Discovery via Structured Table Discovery
Rubab Sarfraz, Boris Glavic. Speeding Up Transformation Search with Lightweight Statistics (Short Paper)
Mohammad Sadeq Abolhasani, Viswanath Ganpathy. Curriculum Matters: Data-Efficient Relational PFN Pre-training with Synthetic Data (Short Paper)

Organization

Workshop Chairs:

Vasilis Efthymiou (Harokopio University of Athens) - vefthym@hua.gr (primary contact)
Oktie Hassanzadeh (IBM Research) - hassanzadeh@us.ibm.com
Chuan Lei (Oracle) - chuan.lei@oracle.com
Kavitha Srinivas (IBM Research) - kavitha.srinivas@ibm.com

Program Committee:

Christian Bizer, University of Mannheim
Christos Diou, Harokopio University of Athens
Ernesto Jimenez-Ruiz, City, University of London
George Papadakis, University of Athens
Georgia Troullinou, CNRS
Haridimos Kondylakis, FORTH-ICS & University of Crete
Horst Samulowitz, IBM Research
Ismael Sanz, University Jaume I
Kostas Stefanidis, Tampere University
Mahdi Esmailoghi, University of Waterloo
Marco Mesiti, University of Milan
Michael R. Glass, IBM Research
Nhan Pham, IBM Research
Panagiotis Koletsis, Harokopio University of Athens
Paolo Papotti, EURECOM
Rafael Berlanga Llavori, University Jaume I
Roee Shraga, WPI
Sola Shirai, IBM Research
Tiago Araujo, Federal Institute of Paraiba
Udaya Khurana, IBM Research
Vassilis Christophides, ENSEA
Venkata Vamsikrishna Meduri, IBM Research
Vincenzo Cutrona, SUPSI
Zezhou Huang, Microsoft