Objectives

Data Analysis is described as the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Performing such tasks over large and heterogeneous collections of tabular data, as found in enterprise data lakes and on the Web, is extremely challenging and an attractive research topic in data management, AI, and related communities. The goal of this workshop is to bring together researchers and practitioners in these diverse communities that work on addressing the fundamental research challenges of tabular data analysis and building automated solutions in this space.

We aim to provide a forum for: a) exchange of ideas between two communities: 1) an active community of data management researchers working on data integration and schema and data matching problems over tabular data, and 2) a vibrant community of researchers in AI and Semantic Web communities working on the core challenge of matching tabular data to Knowledge Graphs as a part of the ISWC SemTab Challenges. b) presentation of late-breaking results related to several emerging research areas such as table representation learning and its applications, use of large language models (LLMs) for tabular data analysis, andautomation of data science pipelines, and automation of data science pipelines that rely on tabular data. c) discussion of real-world challenges related to implementing industrial-scale tabular data anaylsis pipelines, and data lakes and data lakehouse solutions.

Call For Papers

Audience: Our workshop encourages participation from researchers in data management, AI, and Semantic Web communities working on a wide range of problems relevant to tabular data analysis. We hope that this will constitute a single reference point for the researchers and practitioners working in that area and help form new collaborations. We also aim to provide a venue for researchers from industry and practitioners relying on various tabular data analysis tasks to present use cases and discuss their needs in addressing real-world problems and large-scale solutions.

Topics of Interest include but are not limited to:
  • Semantic Table Annotation
  • Automated Tabular Data Understanding
  • Using Large Language Models (LLMs) for Tabular Data Analysis
  • Exploratory Data Analysis over Tabular Data
  • Table Search in Data Lakes
  • Tabular Data Discovery
  • Metadata Management for Tabular Data Analysis
  • Data Augmentation with Tabular Data
  • Integration and Matching of Tabular Data
  • Knowledge Graph Construction and Completion with Tabular Data
  • Automated Discovery of ML Features from Tabular Data
  • ML Model Development with Tabular Data
  • Visualization and Interfaces for Tabular Data Analysis
  • Data Wrangling for Tabular Data Analysis
  • Deep Learning and Representation Learning for Tabular Data Analysis
  • Extraction and Analysis of Tabular Data from (HTML/PDF) Documents and Images
  • Analysis of Tabular Data on the Web (Web Tables)
  • Practical Applications of Tabular Data Analysis
  • Benchmarking and Evaluation Frameworks for Tabular Data Analysis

Submissions

The workshop includes a research track and a systems track. Contributions to the research track can take the form of technical papers addressing various aspects of tabular data analysis. The systems track will include presentations from participants of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). Contributions can also take the form of posters or statements of interest. Full Research Papers should be up to 8 pages (excluding references). Short Research Papers should be up to 4 pages (excluding references). Extended Abstract Papers (for systems track and posters) should be up to 2 pages (excluding references). References do not count towards the page limits mentioned above.

Submission site: https://cmt3.research.microsoft.com/TaDA2026

Submissions should follow the format outlined in the provided zipped LaTeX proceedings directory.

TaDA uses non-anonymous submissions, which means that the authors must list their names and affiliations as part of their submission. Reviewers, on the other hand, are anonymous to the authors. Authors of accepted papers will have the option to include their papers in the VLDB workshop proceedings. At least one co-author is expected to register for the VLDB 2026 conference and present the paper in-person. Please visit the VLDB 2026 website for more information on registration.

Important Dates (Tentative)

  • Submission deadline: May 18, 2026 May 20, 2026
  • Notification of acceptance: June 15, 2026 June 17, 2026
  • Camera-ready copy due: June 29, 2026 July 7, 2026
  • Workshop Day: September 4, 2026 (TBA)
All Times are Anywhere on Earth (AoE).

Program

The program will be announced closer to the workshop date.

Accepted Papers

  • Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei He, Federico Reyes Matthias Fey, Jure Leskovec. Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
  • Yibo Wang, Riteng Zhang, Bharat Bhargava, Chunwei Liu. TabClean: Scalable Tabular Data Cleaning via Reusable LLM-Synthesized Programs (Short Paper)
  • Soroush Omidvartehrani, Davood Rafiel. LDI: Localized Data Imputation for Text-Rich Tables
  • Sola Shirai, Debarun Bhattacharjya, Oktie Hassanzadeh, Gaetano Rosiello. Predicting Table Joinability in Data Lakes using a Metadata Knowledge Graph
  • Nicholas Pulsone, Roee Shraga, Gregory Goren. Understanding Domain-Aware Distribution Alignment in Budgeted Entity Matching (Short Paper)
  • Jiahan Cao, Rachel Pottinger. Representation, Retrieval, and Decision Space: A Case Study on Diagnosing LLM Failures in Column Type Annotation
  • Sebastian Bugedo, Stijn Vansummeren. Hyperdimensional Computing for Structured Query on Tabula Data Embeddings
  • Ayeen Poostforoushan, Liane Vogel, Carsten Binnig. TEmBed-T: A Multi-Dimensional Benchmark for Table-Level Embeddings (Short Paper)
  • Talha Tahmid, Humera Sabir, Ashraf Aboulnaga. From Transport to Grounding: Designing MCP Connectors for Reliable SPARQL Query by LLMs
  • Niharika D'Souza, Liane Vogel Kavitha Srinivas, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz. Can a Single Tabular Embedding Model Service Different Tasks?
  • Yuval Lubarsky, Dean Light, Boaz Berger, Shunit Agmon, Benny Kimelfeld. Incorporating Deep Learning Design in Database Queries
  • Daniele Bertillo, Giorgio Melchiorri, Paolo Merialdo. PaperUnPlot: Benchmarking Chart-to-Table in the Wild
  • Inwon Kang, Kavitha Srinivas, Sola Shirai, Nandana Mihindukulasooriya, Niharika D'Souza Horst Samulowitz, Oshani Seneviratne. Towards Budget-Aware Dense Retrieval for Tables: Trade-offs, Alternatives and Future Directions
  • Michael Zuo, Inwon Kang, Oshani Seneviratne, Stacy Patterson. PFN-Syn: Generating Synthetic Tabular Data with Prior-Data Fitted Networks (Poster)
  • Zhengyuan Dong, Renee Miller. Diversed Model Discovery via Structured Table Discovery
  • Rubab Sarfraz, Boris Glavic. Speeding Up Transformation Search with Lightweight Statistics (Short Paper)
  • Mohammad Sadeq Abolhasani, Viswanath Ganpathy. Curriculum Matters: Data-Efficient Relational PFN Pre-training with Synthetic Data (Short Paper)

Organization

Workshop Chairs: Program Committee:
  • Christian Bizer, University of Mannheim
  • Christos Diou, Harokopio University of Athens
  • Ernesto Jimenez-Ruiz, City, University of London
  • George Papadakis, University of Athens
  • Georgia Troullinou, CNRS
  • Haridimos Kondylakis, FORTH-ICS & University of Crete
  • Horst Samulowitz, IBM Research
  • Ismael Sanz, University Jaume I
  • Kostas Stefanidis, Tampere University
  • Mahdi Esmailoghi, University of Waterloo
  • Marco Mesiti, University of Milan
  • Michael R. Glass, IBM Research
  • Nhan Pham, IBM Research
  • Panagiotis Koletsis, Harokopio University of Athens
  • Paolo Papotti, EURECOM
  • Rafael Berlanga Llavori, University Jaume I
  • Roee Shraga, WPI
  • Sola Shirai, IBM Research
  • Tiago Araujo, Federal Institute of Paraiba
  • Udaya Khurana, IBM Research
  • Vassilis Christophides, ENSEA
  • Venkata Vamsikrishna Meduri, IBM Research
  • Vincenzo Cutrona, SUPSI
  • Zezhou Huang, Microsoft