Reliable or not? An automated classification of webpages about early childhood vaccination using supervised machine learning

Corine S Meppelink; Hanneke Hendriks; Damian Trilling; Julia C M van Weert; Anqi Shao; Eline S Smit

doi:10.1016/j.pec.2020.11.013

Reliable or not? An automated classification of webpages about early childhood vaccination using supervised machine learning

Patient Educ Couns. 2021 Jun;104(6):1460-1466. doi: 10.1016/j.pec.2020.11.013. Epub 2020 Nov 12.

Authors

Corine S Meppelink¹, Hanneke Hendriks², Damian Trilling², Julia C M van Weert², Anqi Shao³, Eline S Smit²

Affiliations

¹ Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, the Netherlands. Electronic address: c.s.meppelink@uva.nl.
² Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, the Netherlands.
³ Amsterdam School of Communication Research, University of Amsterdam, Amsterdam, the Netherlands; Life Sciences Communication, University of Wisconsin-Madison, United States.

PMID: 33243581
DOI: 10.1016/j.pec.2020.11.013

Abstract

Objective: To investigate the applicability of supervised machine learning (SML) to classify health-related webpages as 'reliable' or 'unreliable' in an automated way.

Methods: We collected the textual content of 468 different Dutch webpages about early childhood vaccination. Webpages were manually coded as 'reliable' or 'unreliable' based on their alignment with evidence-based vaccination guidelines. Four SML models were trained on part of the data, whereas the remaining data was used for model testing.

Results: All models appeared to be successful in the automated identification of unreliable (F1 scores: 0.54-0.86) and reliable information (F1 scores: 0.82-0.91). Typical words for unreliable information are 'dr', 'immune system', and 'vaccine damage', whereas 'measles', 'child', and 'immunization rate', were frequent in reliable information. Our best performing model was also successful in terms of out-of-sample prediction, tested on a dataset about HPV vaccination.

Conclusion: Automated classification of online content in terms of reliability, using basic classifiers, performs well and is particularly useful to identify reliable information.

Practice implications: The classifiers can be used as a starting point to develop more complex classifiers, but also warning tools which can help people evaluate the content they encounter online.

Keywords: Consumer health information; Misinformation; Reliability; Supervised machine learning; Vaccination.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Child
Child, Preschool
Humans
Reproducibility of Results
Supervised Machine Learning*
Vaccination*