Background: Schistosomiasis and infection by soil-transmitted helminths are some of the world's most prevalent neglected tropical diseases. Infection by more than one parasite (co-infection) is common and can contribute to clinical morbidity in children. Geostatistical analyses of parasite infection data are key for developing mass drug administration strategies, yet most methods ignore co-infections when estimating risk. Infection status for multiple parasites can act as a useful proxy for data-poor individual-level or environmental risk factors while avoiding regression dilution bias. Conditional random fields (CRF) is a multivariate graphical network method that opens new doors in parasite risk mapping by (i) predicting co-infections with high accuracy; (ii) isolating associations among parasites; and (iii) quantifying how these associations change across landscapes.
Methods: We built a spatial CRF to estimate infection risks for Ascaris lumbricoides, Trichuris trichiura, hookworms (Ancylostoma duodenale and Necator americanus) and Schistosoma mansoni using data from a national survey of Rwandan schoolchildren. We used an ensemble learning approach to generate spatial predictions by simulating from the CRF's posterior distribution with a multivariate boosted regression tree that captured non-linear relationships between predictors and covariance in infection risks. This CRF ensemble was compared against single parasite gradient boosted machines to assess each model's performance and prediction uncertainty.
Results: Parasite co-infections were common, with 19.57% of children infected with at least two parasites. The CRF ensemble achieved higher predictive power than single-parasite models by improving estimates of co-infection prevalence at the individual level and classifying schools into World Health Organization treatment categories with greater accuracy. The CRF uncovered important environmental and demographic predictors of parasite infection probabilities. Yet even after capturing demographic and environmental risk factors, the presences or absences of other parasites were strong predictors of individual-level infection risk. Spatial predictions delineated high-risk regions in need of anthelminthic treatment interventions, including areas with higher than expected co-infection prevalence.
Conclusions: Monitoring studies routinely screen for multiple parasites, yet statistical models generally ignore this multivariate data when assessing risk factors and designing treatment guidelines. Multivariate approaches can be instrumental in the global effort to reduce and eventually eliminate neglected helminth infections in developing countries.
Keywords: Conditional random fields; Neglected tropical disease; Parasite co-infection; Schistosoma mansoni; Soil-transmitted helminths; Spatial epidemiology.