Towards a Large Corpus of Richly Annotated Web Tables
for Knowledge Base Population

Authors: Basil Ell, Sherzod Hakimov, Philipp Braukmann, Lorenzo Cazzoli, Fabian Kaupmann, Amerigo Mancino, Junaid Altaf Memon, Kai Rother, Abhishek Saini, and Philipp Cimiano

About

Towards a Large Corpus of Richly Annotated Web Tables for Knowledge Base Population

LD4IE workshop

ISWC 2017

Cluster of Excellence Cognitive Interaction Technology 'CITEC'

Bielefeld University

Information Extraction from Web Tables

Basil Ell

Sherzod Hakimov

Abstract of the Paper

Web Table Understanding in the context of Knowledge Base Population and the Semantic Web is the task of i) linking the content of tables retrieved from the Web to an RDF knowledge base, ii) of building hypotheses about the tables' structures and contents, iii) of extracting novel information from these tables, and iv) of adding this new informa-tion to a knowledge base. Knowledge Base Population has gained more and more interest in the last years due to the increased demand in large knowledge graphs which became relevant for Artificial Intelligence appli-cations such as Question Answering and Semantic Search. In this paper we describe a set of basic tasks which are relevant for Web Table Understanding in the mentioned context. These tasks incremen-tally enrich a table with hypotheses about the table's content. In doing so, in the case of multiple interpretations, selecting one interpretation and thus deciding against other interpretations is avoided as much as possible. By postponing these decision, we enable learning approaches that gain an understanding of the tables' contents to decide by them-selves, thus increasing the usability of the annotated table data. We present statistics from analyzing and annotating 1.000.000 tables from the Web Table Corpus 2015 and make this dataset available online.

Towards a Large Corpus of Richly Annotated Web Tablesfor Knowledge Base Population

About

Abstract of the Paper

Towards a Large Corpus of Richly Annotated Web Tables
for Knowledge Base Population