Objectives

Data Analysis is described as the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision- making. Performing such tasks over large and heterogeneous collections of tabular data, as found in enterprise data lakes and on the Web, is extremely challenging and an attractive research topic in data management, AI, and related communities. The goal of this workshop is to bring together researchers and practitioners in these diverse communities that work on addressing the fundamental research challenges of tabular data analysis and building automated solutions in this space

We aim to provide a forum for: a) exchange of ideas between two communities: 1) an active community of data management researchers working on data integration and schema and data matching problems over tabular data, and 2) a vibrant community of researchers in AI and Semantic Web communities working on the core challenge of matching tabular data to Knowledge Graphs as a part of the ISWC SemTab Challenges. b) presentation of late-breaking results related to several emerging research areas such as table representation learning and its applications, automation of data science pipelines, and data lake and data lakehouse solutions. c) discussion of real-world data management challenges related to implementing industrial scale tabular data anaylsis solutions.

Call For Papers

Audience: Our workshop encourages participation from researchers in data management, AI, and Semantic Web communities working on a wide range of problems relevant to tabular data analysis. We hope that this will constitute a single reference point for the researchers and practitioners working in that area and help form new collaborations. We also aim to provide a venue for researchers from industry and practitioners relying on various tabular data analysis tasks to present use cases and discuss their needs in addressing real-world problems and large-scale solutions.

Topics of Interest include but are not limited to:
  • Semantic Table Annotation
  • Automated Tabular Data Understanding
  • Exploratory Data Analysis over Tabular Data
  • Table Search in Data Lakes
  • Tabular Data Discovery
  • Tabular Data Discovery in Data Lakes
  • Tabular Data Discovery for Causal Inference
  • Metadata Management for Tabular Data Analysis
  • Data Augmentation with Tabular Data
  • Integration and Matching of Tabular Data
  • Knowledge Graph Construction and Completion with Tabular Data
  • Automated Discovery of ML Features from Tabular Data
  • ML Model Development with Tabular Data
  • Visualization and Interfaces for Tabular Data Analysis
  • Data Wrangling for Tabular Data Analysis
  • Deep Learning and Representation Learning for Tabular Data Analysis
  • Foundation Models for Tabular Data Analysis
  • Extraction and Analysis of Tabular Data from (HTML/PDF) Documents and Images
  • Analysis of Tabular Data on the Web (Web Tables)
  • Practical Applications of Tabular Data Analysis
  • Benchmarking and Evaluation Frameworks for Tabular Data Analysis

Submissions

Contributions to the workshop can take the form of technical papers, posters, or statements of interest addressing various aspects of tabular data analysis, as well as reports on SemTab Challenge participation. Long technical papers should be 8-10 pages long. Short technical papers should be no more than 4 pages long. Posters should not exceed 2 pages. References do not count towards the page limits mentioned above.

Submission site: https://cmt3.research.microsoft.com/TaDA2024
Submissions should follow the double-column CEUR-ART template

Reviews will be anonymous (not dual anonymous). Authors of accepted papers will have the option to include their papers in the CEUR-ART proceedings of the workshop. At least one co-author is expected to register for the VLDB 2024 conference and present the paper in-person. Please visit the VLDB 2024 registration instructions for more information.

Important Dates

  • Abstract Submission deadline: May 2, 2024 May 16, 2024
  • Submission deadline: May 9, 2024 May 23, 2024
  • Late breaking results (poster) submission deadline: TBD
  • Notification of acceptance: June 10, 2024 June 25, 2024
  • Camera-ready copy due: June 28, 2024 July 5, 2024
All Times are Anywhere on Earth (AoE).

Program

Keynote by Shi Han and Haoyu Dong, Microsoft Research
Title: Spreadsheet Intelligence and Data Analytics
Abstract: This keynote will unveil cutting-edge technologies designed to tackle the major challenges in spreadsheet intelligence, encompassing areas such as detecting table ranges, analyzing table structures and sheet layouts, understanding data semantics, and recommending data presentations. Based on spreadsheet intelligence, the presentation will also highlight our research and engineering efforts in boosting automation of data analytics to help Microsoft build technical leadership in the Business Intelligence market. In the trend of Large Language Models (LLMs), we will also present our latest explorations into integrating LLMs with spreadsheet intelligence and data analytics.

Keynote 2: TBD

Organization

Organizing Committee: Steering Committee:
  • Madelon Hulsebos (UC Berkeley)
  • Ernesto Jiménez-Ruiz (City, University of London)
  • Fatemeh Nargesian (University of Rochester)
  • Natasha Noy (Google)
  • Horst Samulowitz (IBM Research)
Program Committee:
  • Nora Abdelmageed (University of Jena)
  • Omar Benjelloun (Google)
  • Rafael Berlanga Llavori (University Jaume I)
  • Carsten Binnig (TU Darmstadt)
  • Christian Bizer (University of Mannheim)
  • Anastasia Dimou (KU Leuven)
  • Christos Diou (Harokopio University of Athens)
  • Madelon Hulsebos (UC Berkeley)
  • Ernesto Jiménez-Ruiz (City, University of London)
  • Aamod Khatiwada (Northeastern University)
  • Marco Mesiti (University of Milan)
  • Renée Miller (Northeastern University)
  • George Papadakis (University of Athens)
  • Paolo Papotti (EURECOM)
  • Ismael Sanz (Universitat Jaume I)
  • Roee Shraga (WPI)
  • Kostas Stefanidis (Tampere University)
  • Gerhard Weikum (Max Planck Institute for Informatics)
  • You Wu (Google)