The methodology of Data Mining. An application to alcohol consumption in teenagers

Elena Gervilla García; Rafael Jiménez López; Juan José Montaño Moreno; Albert Sesé Abad; Berta Cajal Blasco; Alfonso Palmer Pol

The methodology of Data Mining. An application to alcohol consumption in teenagers

Adicciones. 2009;21(1):65-80.

[Article in English, Spanish]

Authors

Elena Gervilla García¹, Rafael Jiménez López, Juan José Montaño Moreno, Albert Sesé Abad, Berta Cajal Blasco, Alfonso Palmer Pol

Affiliation

¹ Area de Metodología de las Ciencias del Comportamiento. Departamento de Psicología. Universitat de les Illes Balears. Spain.

PMID: 19333526

Abstract

This paper is aimed mainly at making researchers in the field of drug addictions aware of a methodology of data analysis aimed at knowledge discovery in databases (KDD). KDD is a process consisting of a series of phases, the most characteristic of which is called data mining (DM), whereby different modelling techniques are applied in order to detect patterns and relationships among the data. Common and differentiating factors between the most widely used DM techniques are analysed, mainly from a methodological viewpoint, and their use is exemplified using data related to alcohol consumption in teenagers and its possible relationship with personality variables (N=7030). Although the overall accuracy obtained (% correct predictions) is very similar in the three models analyzed, the Artificial Neural Network (ANN) technique generates the most accurate model (64.1%), followed by Decision Trees (DT) (62.3%) and Naïve Bayes (NB) (59.9%).

MeSH terms

Adolescent
Alcohol Drinking / psychology*
Algorithms
Decision Trees
Humans
Neural Networks, Computer*