Groups

Data Analysis

The Data Analysis group works on two main fields.

Firstly, we are gathering, preparing and analysing Linked Data. The first part of this pipeline is done by using our open-source crawler Squirrel. This crawler has been used in several projects, including the two research projects OPAL and LIMBO.

After data has been gathered, we provide Fact Checking services like COPAAL and FactCheck that can be used to ensure the veracity of the data with respect to a reference knowledge base or a reference corpus.

The second main field this group is working on is benchmarking. We are maintaining several benchmarking platforms:

HOBBIT is a holistic benchmarking platform for Big Linked Data solutions.
GERBIL is a light-weight platform for benchmarking web services. Currently we are supporting the benchmarking within three areas:
IGUANA is a benchmarking platform for evaluating the performance of triple stores.

Additionally, we are in general interested in the area of benchmarking and provide several benchmarks,

Data Integration

The Linked Data paradigm builds upon the backbone of distributed knowledge bases connected by typed links. The mere volume of current knowledge bases as well as their sheer number pose two major challenges when aiming to support the integration of data across and within them. The first is that tools for data integration have to be time-efficient specially when to deal with big datasets. Secondly, these tools have to carry out the data integration tasks of high quality to serve the applications built upon Linked Data well. Our solutions to the second problem build upon efficient computational approaches developed to solve the first and combine these with dedicated machine learning techniques. All our frameworks for data integration such as LIMES and DEER are open-source and available under a GNU license at https: //github.com/dice-group together with user/developer manuals.

Lead

Mohamed Ahmed Sherif

Members

Abdullah Fathi Ahmed Hamada Zahera Kevin Dreßler Tommaso Soru Idress Tahir Mugdal Kleanthi Georgala Svetlana Pestryakova

Projects

HOBBIT OPAL BDE GeoKnow SLIPO SAGE LIMBO KnowGraphs GEISER

Demos

LIMES

Data Storage and Querying

The constant growth of Linked Data on the Web gives rise to new challenges for querying and integrating massive amounts of data. Such datasets are available through various interfaces, such as data dumps, Linked Data documents and webpages, SPARQL endpoints, Triple Pattern Fragments, or the Linked Data Platform. In addition, various sources produce streaming data. Efficiently querying these sources is of central importance for the scalability of Linked Data and Semantic Web technologies. To exploit the massive amount of data to its full potential, users should be able to store, query, and combine this data easily and effectively. The DSQ group develops scalable and high performance RDF sytems for storing and querying Big RDF data. In addition, we are also working on knowledge extraction from the Web and their RDF graph modelling. Finally, we are keen to design most representative benchmarsk pertaining to the storing, querying, and extracting RDF data.

Lead

Muhammad Saleem

Members

Hashim Khan Alexander Bigerl Manzoor Ali André Valdestilhas Adnan Akhter

Projects

HOBBIT OPAL BDE GeoKnow

Demos

AGDISTIS

Machine Learning

The machine learning group focuses on machine learning technologies for knowledge graphs. The research goal is to develop novel, scalable machine learning algorithms, e.g., for entity embeddings, clustering of high-dimensional data, and explainable machine learning. Research activities of the ML group are in the scope of structured machine learning, e.g., learning concepts in description logics, and entity embeddings, e.g., based on physical models and convolutional complex models. The group provides open-source tools for machine learning on knowledge graphs.

Lead

Stefan Heindorf

Members

Adrian Wilke Caglar Demir Geraldo de Souza Hamada Zahera N'Dah Jean Kouagou Leonie N. Sieger Lukas Blübaum

Projects

COLIDE EML4U DAIKIRI KIAM RAKI TRR 318

NLP and Data Access

Our group works on the intersection of Data Science and Natural Language Processing areas. We focus on creating algorithms that allow computers to extract automatically large-scale knowledge from unstructured data and process them while preserving their semantic key information. We aim to make the acquired knowledge accessible and understandable for both humans and computers. Our team has started addressing two of the most important tasks in NLP by relying on Knowledge Graphs, Named Entity Recognition, and Entity Linking. Our research resulted in two state-of-the-art frameworks in respect of multilingualism and knowledge-graph-based algorithms. Recently, we expanded our focus on different NLP tasks ranging from basic research in computational linguistics to Question Answering, Machine Translation, Natural Language Generation, and Understanding.

Our group includes members of both Paderborn and Leipzig universities.