Incorrectly or simply not annotated data is largely increasing in most public databases, undoubtedly caused by the rise in sequence data and the more recent boom of genomic projects. Molecular biologists and bioinformaticists should join efforts to tackle this issue. Practical challenges have been experienced when studying the alternative oxidase (AOX) gene family, and hence the motivation for the present work. Commonly used databases were screened for their capacity to distinguish AOX from the plastid terminal oxidase (also called plastoquinol terminal oxidase; PTOX) and we put forward a simple approach, based on amino acids signatures, that unequivocally distinguishes these gene families. Further, available sequence data on the AOX family in plants was carefully revised to: (1) confirm the classification as AOX and (2) identify to which AOX family member they belong to. We bring forward the urgent need of misannotation awareness and re-annotation of public AOX sequences by highlighting different types of misclassifications and the large under-estimation of data availability.
Keywords: alternative oxidase; databases; gene annotation; gene family; phylogeny; plastoquinol terminal oxidase; signature-based classification.