← Go back

Extracting Synonyms from Knowledge Graphs

a month ago by Adrian Wilke

Simple text-based search systems do not reflect the semantics of individual input words of search queries. For example, a query for the word “house” would not return records for the words “building” or “real estate”. How can such relationships be represented in a technical system? One approach is to include synonyms. Search engines like Elasticsearch provide methods to integrate synonym lists. However, a list of synonyms itself is required for configuration.


Synonyms in knowledge graphs

There are several knowledge graphs that contain synonyms. The Linguistic Linked Open Data Cloud (LingHub) describes corresponding datasets. The following list shows a selection of linguistic knowledge graphs and their limitations:


Figure: Linguistic Linked Open Data Cloud (LingHub), source (data license: CC BY-NC-SA 4.0)


Extracting synonyms from DBnary

To extract synonyms out of a knowledge graph, the underlying ontology has to be first understood. The following figure shows the part of the ontology required for the extraction of synonyms in the German language:


Figure: Graphical representation of the SPARQL request

With knowledge of the required DBnary data structures, a SPARQL query can now be formulated. An execution of the following example will return 100 entries for German synonyms. The query can be tested using the SPARQL endpoint at http://kaiko.getalp.org/sparql.

SELECT DISTINCT ?germannoun ?synonym WHERE {
?n a <http://www.lexinfo.net/ontology/2.0/lexinfo#Noun> .
?n <http://www.w3.org/ns/lemon/lime#language> "de" .
?n <http://kaiko.getalp.org/dbnary#synonym> ?p .
?p <http://kaiko.getalp.org/dbnary#describes> ?n2 .
?n2 a <http://www.lexinfo.net/ontology/2.0/lexinfo#Noun> .
?n <http://www.w3.org/ns/lemon/ontolex#canonicalForm> ?c .
?c <http://www.w3.org/ns/lemon/ontolex#writtenRep> ?germannoun .
?n2 <http://www.w3.org/ns/lemon/ontolex#canonicalForm> ?c2 .
?c2 <http://www.w3.org/ns/lemon/ontolex#writtenRep> ?synonym .
}
LIMIT 100
OFFSET 0

Listing: SPARQL request

In a test run, we extracted 6,668 nouns in German and 21,634 corresponding synonyms. As a final step, the synonym list can be used to configure search engines and thus improve the findability of data records.


Further reading