Automated cognome construction and semi-automated hypothesis generation

J Neurosci Methods. 2012 Jun 30;208(1):92-100. doi: 10.1016/j.jneumeth.2012.04.019. Epub 2012 May 11.


Modern neuroscientific research stands on the shoulders of countless giants. PubMed alone contains more than 21 million peer-reviewed articles with 40-50,000 more published every month. Understanding the human brain, cognition, and disease will require integrating facts from dozens of scientific fields spread amongst millions of studies locked away in static documents, making any such integration daunting, at best. The future of scientific progress will be aided by bridging the gap between the millions of published research articles and modern databases such as the Allen brain atlas (ABA). To that end, we have analyzed the text of over 3.5 million scientific abstracts to find associations between neuroscientific concepts. From the literature alone, we show that we can blindly and algorithmically extract a "cognome": relationships between brain structure, function, and disease. We demonstrate the potential of data-mining and cross-platform data-integration with the ABA by introducing two methods for semi-automated hypothesis generation. By analyzing statistical "holes" and discrepancies in the literature we can find understudied or overlooked research paths. That is, we have added a layer of semi-automation to a part of the scientific process itself. This is an important step toward fundamentally incorporating data-mining algorithms into the scientific method in a manner that is generalizable to any scientific or medical field.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Abstracting and Indexing / methods*
  • Brain / physiology*
  • Cognition / physiology*
  • Data Mining / methods*
  • Humans
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods
  • Periodicals as Topic*
  • Proteome / metabolism*


  • Proteome