A reference library for assigning protein subcellular localizations by image-based machine learning

J Cell Biol. 2020 Mar 2;219(3):e201904090. doi: 10.1083/jcb.201904090.

Abstract

Confocal micrographs of EGFP fusion proteins localized at key cell organelles in murine and human cells were acquired for use as subcellular localization landmarks. For each of the respective 789,011 and 523,319 optically validated cell images, morphology and statistical features were measured. Machine learning algorithms using these features permit automated assignment of the localization of other proteins and dyes in both cell types with very high accuracy. Automated assignment of subcellular localizations for model tail-anchored proteins with randomly mutated C-terminal targeting sequences allowed the discovery of motifs responsible for targeting to mitochondria, endoplasmic reticulum, and the late secretory pathway. Analysis of directed mutants enabled refinement of these motifs and characterization of protein distributions in within cellular subcompartments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Line
  • Epithelial Cells / metabolism*
  • Green Fluorescent Proteins / metabolism*
  • Humans
  • Image Processing, Computer-Assisted / standards*
  • Machine Learning / standards*
  • Mice
  • Microscopy, Confocal / standards*
  • Mutation
  • Organelles / metabolism*
  • Pattern Recognition, Automated / standards
  • Protein Transport
  • Recombinant Fusion Proteins / genetics
  • Recombinant Fusion Proteins / metabolism*
  • Reference Standards
  • Secretory Pathway

Substances

  • Recombinant Fusion Proteins
  • enhanced green fluorescent protein
  • Green Fluorescent Proteins

Grants and funding