A recent IBM survey estimates that more than 3.7 billion humans use the internet every day and produce nearly 2.5 quintillion bytes of data on the Web each day. The availability of such large amounts of data is commonly regarded as one of the motors for the current lapses in the development of AI-powered solutions. One of the most important types of data sources found on the Web are Knowledge Graphs (KG). They implement a broad spectrum of formalisms to represent and query entities and their interrelations via interconnected semantic graph networks. While the term "knowledge graph" was popularized by Google in 2012, it is now used by a plethora of companies to describe a multitude of datasets using formalisms of varying expressiveness and semantics, which are used to tackle real-world problems.
Recently, researchers have exploited the use of knowledge graphs to support and improve a variety of open research problems. One trending topic is the use of RDF Knowledge Graphs in the Natural Language Generation (NLG) area for creating natural language text. A given RDF KG commonly stores knowledge in triples. Each triple consists of (i) a subject---often an entity, (ii) a relation---often called property---and (iii) an object---an entity or a literal ( a string or a value with a unit). Thus, relying on NLG algorithms is possible to verbalize the following triple as "Edmund Hillary was born in Auckland." However, many pre-steps are required in order to enable the generation of text from RDF triples such as Graph Summarization.
In this project group, the students are expected to deepen their knowledge in all steps regarding the RDF-to-Text generation task. To this end, the LD2NL project developed by DICE group will be used as a scientific framework. A beneficial side-effect will be acquiring teamwork experience on a scientific project.
The presentation slides can be found here
The course consists of weekly meetings which will monitor the progress of students in the assigned tasks.