Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis
- PMID: 31511682
- DOI: 10.1038/s41586-019-1540-5
Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis
Abstract
Most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases1, heuristics2 and social influences3. These anthropogenic chemical reaction data are widely used to train machine-learning models4 that are used to predict organic5 and inorganic6,7 syntheses. However, it is known that societal biases are encoded in datasets and are perpetuated in machine-learning models8. Here we identify as-yet-unacknowledged anthropogenic biases in both the reagent choices and reaction conditions of chemical reaction datasets using a combination of data mining and experiments. We find that the amine choices in the reported crystal structures of hydrothermal synthesis of amine-templated metal oxides9 follow a power-law distribution in which 17% of amine reactants occur in 79% of reported compounds, consistent with distributions in social influence models10-12. An analysis of unpublished historical laboratory notebook records shows similarly biased distributions of reaction condition choices. By performing 548 randomly generated experiments, we demonstrate that the popularity of reactants or the choices of reaction conditions are uncorrelated to the success of the reaction. We show that randomly generated experiments better illustrate the range of parameter choices that are compatible with crystal formation. Machine-learning models that we train on a smaller randomized reaction dataset outperform models trained on larger human-selected reaction datasets, demonstrating the importance of identifying and addressing anthropogenic biases in scientific data.
Comment in
-
Look out for potential bias in chemical data sets.Nature. 2019 Sep;573(7773):164. doi: 10.1038/d41586-019-02670-w. Nature. 2019. PMID: 31511688 No abstract available.
Similar articles
-
Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit.J Chem Inf Model. 2023 Jun 26;63(12):3659-3668. doi: 10.1021/acs.jcim.3c00577. Epub 2023 Jun 14. J Chem Inf Model. 2023. PMID: 37312524 Free PMC article. Review.
-
Machine-learning-assisted materials discovery using failed experiments.Nature. 2016 May 5;533(7601):73-6. doi: 10.1038/nature17439. Nature. 2016. PMID: 27147027
-
Controlling an organic synthesis robot with machine learning to search for new reactivity.Nature. 2018 Jul;559(7714):377-381. doi: 10.1038/s41586-018-0307-8. Epub 2018 Jul 18. Nature. 2018. PMID: 30022133 Free PMC article.
-
Systematic auditing is essential to debiasing machine learning in biology.Commun Biol. 2021 Feb 10;4(1):183. doi: 10.1038/s42003-021-01674-5. Commun Biol. 2021. PMID: 33568741 Free PMC article.
-
Perturbation Theory Machine Learning Models: Theory, Regulatory Issues, and Applications to Organic Synthesis, Medicinal Chemistry, Protein Research, and Technology.Curr Top Med Chem. 2018;18(14):1203-1213. doi: 10.2174/1568026618666180810124031. Curr Top Med Chem. 2018. PMID: 30095052 Review.
Cited by
-
Artificial Intelligence-Powered Electronic Skin.Nat Mach Intell. 2023 Dec;5(12):1344-1355. doi: 10.1038/s42256-023-00760-z. Epub 2023 Dec 18. Nat Mach Intell. 2023. PMID: 38370145 Free PMC article.
-
Progress and prospects for accelerating materials science with automated and autonomous workflows.Chem Sci. 2019 Sep 20;10(42):9640-9649. doi: 10.1039/c9sc03766g. eCollection 2019 Nov 14. Chem Sci. 2019. PMID: 32153744 Free PMC article. Review.
-
Chemical property prediction under experimental biases.Sci Rep. 2022 May 17;12(1):8206. doi: 10.1038/s41598-022-12116-5. Sci Rep. 2022. PMID: 35581358 Free PMC article.
-
Combatting over-specialization bias in growing chemical databases.J Cheminform. 2023 May 19;15(1):53. doi: 10.1186/s13321-023-00716-w. J Cheminform. 2023. PMID: 37208694 Free PMC article.
-
Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit.J Chem Inf Model. 2023 Jun 26;63(12):3659-3668. doi: 10.1021/acs.jcim.3c00577. Epub 2023 Jun 14. J Chem Inf Model. 2023. PMID: 37312524 Free PMC article. Review.
References
-
- Tversky, A. & Kahneman, D. Judgment under uncertainty: heuristics and biases. Science 185, 1124–1131 (1974). - DOI
-
- Gigerenzer, G. & Gaissmaier, W. Heuristic decision making. Annu. Rev. Psychol. 62, 451–482 (2011). - DOI
-
- Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006). - DOI
-
- Henson, A. B., Gromski, P. S. & Cronin, L. Designing algorithms to aid discovery by chemical robots. ACS Cent. Sci. 4, 793–804 (2018). - DOI
-
- Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018). - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
