Automatic detection of semantic primitives using optimization based on genetic algorithm

PeerJ Comput Sci. 2023 Apr 5:9:e1282. doi: 10.7717/peerj-cs.1282. eCollection 2023.


In this article, we propose a method for the automatic retrieval of a set of semantic primitive words from an explanatory dictionary and a novel evaluation procedure for the obtained set of primitives. The approach is based on the representation of the dictionary as a directed graph with a single-objective constrained optimization problem via a genetic algorithm with the PageRank scoring model. The problem is defined as a subset selection. The algorithm is fit to search for the sets of words that should fulfil several requirements: the cardinality of the set should not exceed empirically selected limits and the PageRank word importance score is minimized with cycle prevention thresholding. In the experiments, we used the WordNet dictionary for English. The proposed method is an improvement over the previous state-of-the-art solutions.

Keywords: Computational lexicography; Differential evolution; Explanatory dictionary; Lexicography; Natural language processing; PageRank; Semantic primes; Semantic primitives.

Grants and funding

The work was done with support from the Mexican Government through the grant A1-S-47854 of CONACYT, Mexico, and grants 20220852 and 20220859 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico and the support of Microsoft through the Microsoft Latin America PhD Award. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.