The Cannabis sativa genetics and therapeutics relationship network: automatically associating cannabis-related genes to therapeutic properties through chemicals from cannabis literature

J Cannabis Res. 2023 May 30;5(1):16. doi: 10.1186/s42238-023-00182-z.

Abstract

Background: Understanding the genome of Cannabis sativa holds significant scientific value due to the multi-faceted therapeutic nature of the plant. Links from cannabis gene to therapeutic property are important to establish gene targets for the optimization of specific therapeutic properties through selective breeding of cannabis strains. Our work establishes a resource for quickly obtaining a complete set of therapeutic properties and genes associated with any known cannabis chemical constituent, as well as relevant literature.

Methods: State-of-the-art natural language processing (NLP) was used to automatically extract information from many cannabis-related publications, thus producing an undirected multipartite weighted-edge paragraph co-occurrence relationship network composed of two relationship types, gene-chemical and chemical property. We also developed an interactive application to visualize sub-graphs of manageable size.

Results: Two hundred thirty-four cannabis constituent chemicals, 352 therapeutic properties, and 124 genes from the Cannabis sativa genome form a multipartite network graph which transforms 29,817 cannabis-related research documents from PubMed Central into an easy to visualize and explore network format.

Conclusion: Use of our network replaces time-consuming and labor intensive manual extraction of information from the large amount of available cannabis literature. This streamlined information retrieval process will enhance the activities of cannabis breeders, cannabis researchers, organic biochemists, pharmaceutical researchers and scientists in many other disciplines.

Keywords: Cannabis sativa; Natural language processing; Relationship extraction; Knowledge graph; Cannabinoids; Terpenoids; Natural medicine; Plant genetics.