Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1:2018:bay098.
doi: 10.1093/database/bay098.

Large-scale automated machine reading discovers new cancer-driving mechanisms

Affiliations

Large-scale automated machine reading discovers new cancer-driving mechanisms

Marco A Valenzuela-Escárcega et al. Database (Oxford). .

Abstract

PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

PubMed Disclaimer

Figures

<sc>Figure</sc> 1
Figure 1
The annual rate of publications in the biomedical domain, as indexed by PubMed. The darker blue highlights that publications have exceeded 1 million per year starting in 2011.
<sc>Figure</sc> 2
Figure 2
Architecture of the Reach system together with a walk-through example.
<sc>Figure</sc> 3
Figure 3
Taxonomy of the entities and events recognized by Reach. Though abbreviated, the Removal events mirror those listed under Addition.
<sc>Figure</sc> 4
Figure 4
The Reach output is formula image times larger than the size of PCs. We conjecture that the small overlap is caused by the fact that the Reach interactions are extracted from open-access publications, whereas PCs pathways come mostly from other, paywalled publications. The high-confidence subset is of relations that were found in more than one paper.
<sc>Figure</sc> 5
Figure 5
Reach allows Mutex to detect seven new candidate ‘driver’ genes for breast cancer which are not detected otherwise, when using PCs alone, or without using any network. We observed similar results for six other cancers in the TCGA data set.
<sc>Figure</sc> 6
Figure 6
Mutex groups for TCGA breast cancer. This graph shows the interactions of the genes in each Mutex group and their targets. The highlighted relations exist in Reach data but not in PCs. Highlighted genes are not detectable without using Reach data.

Similar articles

Cited by

References

    1. Aksoy B.A., Demir E., Babur Ö. et al. (2014) Prediction of individualized therapeutic vulnerabilities in cancer from genomic profiles. Bioinformatics, 30, 2051--2059. - PMC - PubMed
    1. Allen J.F., Swift M. and De Beaumont W. (2008) Deep semantic analysis of text. In: Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics, pp. 343--354.
    1. Appelt D.E., Hobbs J.R., Bear J. et al. (1993) FASTUS: A finite-state processor for information extraction from real-world text. In: Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI). Morgan Kaufmann, San Mateo, CA.
    1. Babur Ö., Demir E., Gönen M. et al. (2010) Discovering modulators of gene expression. Nucleic Acids Res., 38, 5648--5656. - PMC - PubMed
    1. Babur Ö., Gönen M., Aksoy B.A. et al. (2015) Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol., 16, 45. - PMC - PubMed

Publication types

LinkOut - more resources