The methodology of Data Mining. An application to alcohol consumption in teenagers

Adicciones. 2009;21(1):65-80.
[Article in English, Spanish]

Abstract

This paper is aimed mainly at making researchers in the field of drug addictions aware of a methodology of data analysis aimed at knowledge discovery in databases (KDD). KDD is a process consisting of a series of phases, the most characteristic of which is called data mining (DM), whereby different modelling techniques are applied in order to detect patterns and relationships among the data. Common and differentiating factors between the most widely used DM techniques are analysed, mainly from a methodological viewpoint, and their use is exemplified using data related to alcohol consumption in teenagers and its possible relationship with personality variables (N=7030). Although the overall accuracy obtained (% correct predictions) is very similar in the three models analyzed, the Artificial Neural Network (ANN) technique generates the most accurate model (64.1%), followed by Decision Trees (DT) (62.3%) and Naïve Bayes (NB) (59.9%).

MeSH terms

  • Adolescent
  • Alcohol Drinking / psychology*
  • Algorithms
  • Decision Trees
  • Humans
  • Neural Networks, Computer*