FlyIT: Drosophila Embryogenesis Image Annotation based on Image Tiling and Convolutional Neural Networks

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):194-204. doi: 10.1109/TCBB.2019.2935723. Epub 2021 Feb 3.

Abstract

With the rise of image-based transcriptomics, spatial gene expression data has become increasingly important for understanding gene regulations from the tissue level down to the cell level. Especially, the gene expression images of Drosophila embryos provide a new data source in the study of Drosophila embryogenesis. It is imperative to develop automatic annotation tools since manual annotation is labor-intensive and requires professional knowledge. Although a lot of image annotation methods have been proposed in the computer vision field, they may not work well for gene expression images, due to the great difference between these two annotation tasks. Besides the apparent difference on images, the annotation is performed at the gene level rather than the image level, where the expression patterns of a gene are recorded in multiple images. Moreover, the annotation terms often correspond to local expression patterns of images, yet they are assigned collectively to groups of images and the relations between the terms and single images are unknown. In order to learn the spatial expression patterns comprehensively for genes, we propose a new method, called FlyIT (image annotation based on Image Tiling and convolutional neural networks for fruit Fly). We implement two versions of FlyIT, learning at image-level and gene-level, respectively. The gene-level version employs an image tiling strategy to get a combined image feature representation for each gene. FlyIT uses a pre-trained ResNet model to obtain feature representation and a new loss function to deal with the class imbalance problem. As the annotation of Drosophila images is a multi-label classification problem, the new loss function considers the difficulty levels for recognizing different labels of the same sample and adjusts the sample weights accordingly. The experimental results on the FlyExpress database show that both the image tiling strategy and the deep architecture lead to the great enhancement of the annotation performance. FlyIT outperforms the existing annotators by a large margin (over 9 percent on AUC and 12 percent on macro F1 for predicting the top 10 terms). It also shows advantages over other deep learning models, including both single-instance and multi-instance learning frameworks.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods
  • Data Curation
  • Drosophila / embryology*
  • Embryo, Nonmammalian / diagnostic imaging
  • Embryonic Development / physiology*
  • Image Processing, Computer-Assisted / methods*
  • Neural Networks, Computer*