Extracting Synonyms from Knowledge Graphs
5 years ago by Adrian Wilke
Simple text-based search systems do not reflect the semantics of individual input words of search queries. For example, a query for the word “house” would not return records for the words “building” or “real estate”. How can such relationships be represented in a technical system? One approach is to include synonyms. Search engines like Elasticsearch provide methods to integrate synonym lists. However, a list of synonyms itself is required for configuration.
Synonyms in knowledge graphs
There are several knowledge graphs that contain synonyms. The Linguistic Linked Open Data Cloud (LingHub) describes corresponding datasets. The following list shows a selection of linguistic knowledge graphs and their limitations:
- BabelNet (Limits in API requests)
- DBnary (Multilingual. Creative Commons Attribution-ShareAlike 3.0)
- RDF/OWL Representation of WordNet (Limited to English)
- LingHub lists further data sources
Figure: Linguistic Linked Open Data Cloud (LingHub), source (data license: CC BY-NC-SA 4.0)
Extracting synonyms from DBnary
To extract synonyms out of a knowledge graph, the underlying ontology has to be first understood. The following figure shows the part of the ontology required for the extraction of synonyms in the German language:
Figure: Graphical representation of the SPARQL request
With knowledge of the required DBnary data structures, a SPARQL query can now be formulated. An execution of the following example will return 100 entries for German synonyms. The query can be tested using the SPARQL endpoint at http://kaiko.getalp.org/sparql.
SELECT DISTINCT ?germannoun ?synonym WHERE {
?n a <http://www.lexinfo.net/ontology/2.0/lexinfo#Noun> .
?n <http://www.w3.org/ns/lemon/lime#language> "de" .
?n <http://kaiko.getalp.org/dbnary#synonym> ?p .
?p <http://kaiko.getalp.org/dbnary#describes> ?n2 .
?n2 a <http://www.lexinfo.net/ontology/2.0/lexinfo#Noun> .
?n <http://www.w3.org/ns/lemon/ontolex#canonicalForm> ?c .
?c <http://www.w3.org/ns/lemon/ontolex#writtenRep> ?germannoun .
?n2 <http://www.w3.org/ns/lemon/ontolex#canonicalForm> ?c2 .
?c2 <http://www.w3.org/ns/lemon/ontolex#writtenRep> ?synonym .
}
LIMIT 100
OFFSET 0
Listing: SPARQL request
In a test run, we extracted 6,668 nouns in German and 21,634 corresponding synonyms. As a final step, the synonym list can be used to configure search engines and thus improve the findability of data records.
Further reading
- German version of this text: OPAL deliverable D7.1: Suchkomponente (Tweet)
- Learn more about the Open Data Portal Germany (OPAL) project