Medical Mobile App Classification Using the National Institute for Health and Care Excellence Evidence Standards Framework for Digital Health Technologies: Interrater Reliability Study

Khine Nwe; Mark Erik Larsen; Natalie Nelissen; David Chi-Wai Wong

doi:10.2196/17457

Medical Mobile App Classification Using the National Institute for Health and Care Excellence Evidence Standards Framework for Digital Health Technologies: Interrater Reliability Study

J Med Internet Res. 2020 Jun 5;22(6):e17457. doi: 10.2196/17457.

Authors

Khine Nwe¹, Mark Erik Larsen², Natalie Nelissen³, David Chi-Wai Wong^{4

5}

Affiliations

¹ Leeds Institute of Health Sciences, University of Leeds, Leeds, United Kingdom.
² Black Dog Institute, University of New South Wales, Sydney, Australia.
³ Leeds Institute of Data Analytics, University of Leeds, Leeds, United Kingdom.
⁴ Centre for Health Informatics, University of Manchester, Manchester, United Kingdom.
⁵ Department of Computer Science, University of Manchester, Manchester, United Kingdom.

PMID: 32501271
PMCID: PMC7305556
DOI: 10.2196/17457

Abstract

Background: Clinical governance of medical mobile apps is challenging, and there is currently no standard method for assessing the quality of such apps. In 2018, the National Institute for Health and Care Excellence (NICE) developed a framework for assessing the required level of evidence for digital health technologies (DHTs), as determined by their clinical function. The framework can potentially be used to assess mobile apps, which are a subset of DHTs. To be used reliably in this context, the framework must allow unambiguous classification of an app's clinical function.

Objective: The objective of this study was to determine whether mobile health apps could be reliably classified using the NICE evidence standards framework for DHTs.

Methods: We manually extracted app titles, screenshots, and content descriptions for all apps listed on the National Health Service (NHS) Apps Library website on July 12, 2019; none of the apps were downloaded. Using this information, 2 mobile health (mHealth) researchers independently classified each app to one of the 4 functional tiers (ie, 1, 2, 3a, and 3b) described in the NICE digital technologies evaluation framework. Coders also answered contextual questions from the framework to identify whether apps were deemed to be higher risk. Agreement between coders was assessed using Cohen κ statistic.

Results: In total, we assessed 76 apps from the NHS Apps Library. There was classification agreement for 42 apps. Of these, 0 apps were unanimously classified into Tier 1; 24, into Tier 2; 15, into Tier 3a; and 3, into Tier 3b. There was disagreement between coders in 34/76 cases (45%); interrater agreement was poor (Cohen κ=0.32, 95% CI 0.16-0.47). Further investigation of disagreements highlighted 5 main explanatory themes: apps that did not correspond to any tier, apps that corresponded to multiple tiers, ambiguous tier descriptions, ambiguous app descriptions, and coder error.

Conclusions: The current iteration of the NICE evidence standards framework for DHTs did not allow mHealth researchers to consistently and unambiguously classify digital health mobile apps listed on the NHS app library according to their functional tier.

Keywords: NHS Apps Library; NICE; evaluation; evidence; interrater; mHealth; telehealth.

©Khine Nwe, Mark Erik Larsen, Natalie Nelissen, David Chi-Wai Wong. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 05.06.2020.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomedical Technology / methods*
Humans
Mobile Applications / classification*
National Institutes of Health (U.S.) / standards*
Reproducibility of Results
Telemedicine / classification*
United States