← Go back

An Automatic Benchmark Generator for Entity Recognition and Linking

23 days ago by René Speck

Creating of gold standards is of central importance for the objective assessment and development of approaches all around computer science. The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain.

At the 11th International Conference on Natural Language Generation (INLG 2018), we presented BENGAL, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by BENGAL and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish.

Our results suggest that our approach can generate diverse benchmarks with characteristics similar to those of a large proportion of existing benchmarks in several languages. Overall, our results suggest that BENGAL benchmarks can ease the development of named entity recognition and entity linking tools (especially for resource-poor languages) by providing developers with insights into their performance at virtually no cost. Hence, BENGAL can improve the push towards better named entity recognition and entity linking frameworks.

Our approach is open-source: https://github.com/dice-group/BENGAL. Our experimental results are available at http://faturl.com/bengalexpinlg and the paper at: https://www.aclweb.org/anthology/W18-6541/.