10:20-10:30 |
Opening |
10:30-11:20 |
Keynote talk - Renée Miller: From Discovery to Integration of Data Lake
Tables
|
11:20 - 12:00 |
Session 1 |
11:20 - 11:40 |
Keti Korini, Christian Bizer. Column Type Annotation using
ChatGPT
[pdf]
|
11:40 - 11:50 |
Viet-Phi Huynh, Yoan Chabot, Raphael Troncy. Towards Generative
Semantic
Table
Interpretation
[pdf]
|
11:50 - 12:00 |
Aneta Koleva, Martin Ringsquandl, Volker Tresp.
Adversarial
Attacks on
Tables with Entity
Swap
[pdf]
|
|
12:00 - 13:30 |
Lunch |
13:30 - 14:20 |
Keynote talk - Alon Halevy: Personal Digital Data: Where LLMs Meet
Structured Data |
14:20 - 15:00 |
Session 2 |
14:20 - 14:30 |
Hamed Mirzaei, Davood Rafiei. Table Union Search with
Preferences
[pdf]
|
14:30 - 14:40 |
Vijay S Kumar, Varish Mulwad, Jenny Williams, Tim Finin,
Sharad Dixit, Anupam Joshi. Knowledge Graph-driven Tabular Data Discovery
from Scientific
Documents [pdf]
|
14:40 - 14:50 |
Arif Usta, Semih Salihoglu. To Join or Not to Join: An
Analysis on
the
Usefulness of
Joining Tables in Open Government Data Portals [pdf]
|
14:50 - 15:00 |
Liane Vogel, Carsten Binnig. WikiDBs: A Corpus Of
Relational
Databases
From Wikidata [pdf]
|
|
15:00 - 15:50 |
Poster session |
15:00 - 15:05 |
Davood Rafiei, Arash Dargahi Nobari, Soroush
Omidvartehrani.
Discovering
and Integrating
Tabular Data [pdf]
|
15:05 - 15:10 |
Eva Chrysostomaki, Maria Stratigi, Vasilis Efthymiou,
Kostas
Stefanidis,
Dimitris Plexousakis. Fair Sequential Group Recommendations in SQUIRREL
Movies [pdf]
|
|
15:50 - 16:00 |
Closing and Awards |
Keynote by Renée Miller,
Northeastern University
Title: From Discovery to Integration of Data Lake Tables
Abstract: We have made tremendous strides in providing tools for data scientists to discover new
tables that are useful for their analyses. But despite these advances, the proper integration of
discovered tables has been under-explored. An interesting semantics for integration, called full
disjunction, was proposed in the 1990’s, but there has been little progress in using it for data
science to integrate tables culled from data lakes. In this talk, I will overview both ALITE, a
method to integrate (possibly incomplete) tables using a new scalable implementation of full
disjunction, and DIALITE, an open discovery system that lets users discover, integrate, then
analyze a set of tables using discovery methods such as Starmie, a new table union search
method. To evaluate our systems, we developed and shared three new benchmarks for integration
that use real data lake tables. I will present open problems and challenges in developing and
evaluating scalable table search and integration methods on real data.
The ALITE [1] work was led by Aamod Khatiwada in collaboration with Professors Roee Shraga of
the Worcester Polytechnic Institute, Renée Miller, and Wolfgang Gatterbauer of Northeastern
University in Boston. DIALTE [2} was also led by Aamod Khatiwada in collaboration with
Professors Roee Shraga and Renée Miller.
Starmie [3] was led by Grace Fan in collaboration with Megagon Labs researchers Jin Wang,
Yuliang Li and Dan Zhang.
[1] Aamod Khatiwada, Roee Shraga, Wolfgang Gatterbauer, Renée J. Miller: Integrating Data Lake
Tables. PVLDB. 16(4): 932-945 (2022).
[2] Aamod Khatiwada, Roee Shraga, Renée J. Miller: DIALITE: Discover, Align and Integrate Open
Data Tables. ACM SIGMOD, 187-190 (2023).
[3] Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, Renée J. Miller: Semantics-aware Dataset
Discovery from Data Lakes with Contextualized Column-based Representations. PVLDB. 16(8):
1726-1739 (2023)
Keynote by Alon Halevey, Meta
AI
Title: Personal Digital Data: Where LLMs Meet Structured Data
Abstract: The important question of how companies and organizations use our data has
received a lot of attention in the technology and policy communities. An equally important
question that deserves more focus going forward is how we, as individuals, can take advantage of
the data we generate to improve our health, vitality, and productivity and our overall
well-being.
We create a variety of data throughout our days, including our photos, workout stats, locations
we’ve been
to, the stuff we buy online and the content we consume. Fusing all this data together enables us
to build a
fascinating timeline of our lives. To leverage these timelines in order to help us produce new
satisfying
experiences we need to be able to query our timelines in natural language and to share short
summaries of
it with external services.
This talk will start by motivating the work on fusing personal digital data including its
potential pitfalls.
I will then discuss multiple approaches to the problem of querying timelines, which is an
application area that
forces us to consider deeply how language models can be used to query data that is partially
structured and partially not.