A data-driven methodology to discover similarities between cocaine samples

Fidelia Cascini; Nadia De Giovanni; Ilaria Inserra; Federico Santaroni; Luigi Laura

doi:10.1038/s41598-020-72652-w

A data-driven methodology to discover similarities between cocaine samples

Sci Rep. 2020 Sep 29;10(1):15976. doi: 10.1038/s41598-020-72652-w.

Authors

Fidelia Cascini¹, Nadia De Giovanni², Ilaria Inserra³, Federico Santaroni⁴, Luigi Laura⁵

Affiliations

¹ Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Rome, Italy. fidelia.cascini1@unicatt.it.
² Fondazione Policlinico Agostino Gemelli IRCCS, Largo Agostino Gemelli 8, 00168, Rome, Italy.
³ Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Rome, Italy.
⁴ Department of Computer, Control, and Management Engineering Antonio Ruberti (DIAG), Sapienza University of Rome, 00186, Rome, Italy.
⁵ International Telematic University Uninettuno of Rome, Rome, Italy.

Abstract

Machine learning has been used for distinct purposes in the science field but no applications on illegal drug have been done before. This study proposes a new web-based system for cocaine classification, profiling relations and comparison, that is capable of producing meaningful output based on a large amount of chemical profiling's data. In particular, the Profiling Relations In Drug trafficking in Europe (PRIDE) system, offers several advantages to intelligence actions across Europe. Thus, it provides a standardized, broad methodology which uses machine learning algorithms to classify and compare drug profiles, highlight how similar drug samples are, and how probable it is that they share a common origin, batch, or preparation process. We evaluated the proposed algorithms using precision and recall metrics and analyzed the quality of predictions performed by the algorithms, with respect to our gold standard. In our experiments, we reached a value of 88% for F_0.5-measure, 91% for precision, and 78% for recall, confirming our main hypothesis: machine learning can learn and be applied to have an automatic classification of cocaine profiles.

Publication types

Research Support, Non-U.S. Gov't