Evaluating heterogeneity in indoor and outdoor air pollution using land-use regression and constrained factor analysis

Res Rep Health Eff Inst. 2010 Dec;(152):5-80; discussion 81-91.


Previous studies have identified associations between traffic exposures and a variety of adverse health effects, but many of these studies relied on proximity measures rather than measured or modeled concentrations of specific air pollutants, complicating interpretability of the findings. An increasing number of studies have used land-use regression (LUR) or other techniques to model small-scale variability in concentrations of specific air pollutants. However, these studies have generally considered a limited number of pollutants, focused on outdoor concentrations (or indoor concentrations of ambient origin) when indoor concentrations are better proxies for personal exposures, and have not taken full advantage of statistical methods for source apportionment that may have provided insight about the structure of the LUR models and the interpretability of model results. Given these issues, the primary objective of our study was to determine predictors of indoor and outdoor residential concentrations of multiple traffic-related air pollutants within an urban area, based on a combination of central site monitoring data; geographic information system (GIS) covariates reflecting traffic and other outdoor sources; questionnaire data reflecting indoor sources and activities that affect ventilation rates; and factor-analytic methods to better infer source contributions. As part of a prospective birth cohort study assessing asthma etiology in urban Boston, we collected indoor and/or outdoor 3-to-4 day samples of nitrogen dioxide (NO2) and fine particulate matter with an aerodynamic diameter or = 2.5 pm (PM2.5) at 44 residences during multiple seasons of the year from 2003 through 2005. We performed reflectance analysis, x-ray fluorescence spectroscopy (XRF), and high-resolution inductively coupled plasma-mass spectrometry (ICP-MS) on particle filters to estimate the concentrations of elemental carbon (EC), trace elements, and water-soluble metals, respectively. We derived multiple indicators of traffic using Massachusetts Highway Department (MHD) data and traffic counts collected outside the residences where the air monitoring was conducted. We used a standardized questionnaire to collect data on home characteristics and occupant behaviors. Additional housing information was collected through property tax records. Ambient concentrations of pollutants as well as meteorological data were collected from centrally located ambient monitors. We used GIS-based LUR models to explain spatial and temporal variability in residential outdoor concentrations of PM2.5, EC, and NO2. We subsequently derived latent-source factors for residential outdoor concentrations using confirmatory factor analysis constrained to nonnegative loadings. We developed LUR models to determine whether GIS covariates and other predictors explain factor variability and thereby support initial factor interpretations. To evaluate indoor concentrations, we developed physically interpretable regression models that explored the relationship between measured indoor and outdoor concentrations, relying on questionnaire data to characterize indoor sources and activities. Because outdoor pollutant concentrations measured directly outside of homes are unlikely to be available for most large epidemiologic studies, we developed regression models to explain indoor concentrations of PM2.5, EC, and NO2 as a function of other, more readily available data: GIS covariates, questionnaire data reflecting both sources and ventilation, and central site monitoring data. As we did for outdoor concentrations, we then derived latent-source factors for residential indoor concentrations and developed regression models explaining variability in these indoor latent-source factors. Finally, to provide insight about the effects of improved characterization of exposures for the results of subsequent epidemiologic investigations, we developed a simulation framework to quantitatively compare the implications of using exposure models derived from validation studies with the use of other surrogate models with varying amounts of measurement error. The concentrations of outdoor PM2.5 were strongly associated with the central site monitor data, whereas EC concentrations showed greater spatial variability, especially during colder months, and were predicted by the length of roadway within 200 m of the home. Outdoor NO2 also showed significant spatial variability, predicted in part by population density and roadway length within 50 m of the home. Our constrained factor analysis of outdoor concentrations produced loadings indicating long-range transport, brake wear and traffic exhaust, diesel exhaust, fuel oil combustion, and resuspended road dust as sources; corresponding LUR models largely corroborated these factor interpretations through covariate significance. For example, long-range transport was predicted by central site PM2.5, and season, brake wear and traffic exhaust and resuspended road dust by traffic and residential density, diesel exhaust by the percentage of diesel traffic on the nearest major road, and fuel oil combustion by population density. Our modeling of the concentrations of indoor pollutants demonstrated substantial variability in indoor-outdoor relationships across constituents, helping to separate constituents dominated by outdoor sources (e.g., S, Se, and V) from those dominated by indoor sources (e.g., Ca and Si). Regression models indicated that indoor PM2.5 was not influenced substantially by local traffic but had significant indoor sources (cooking activity and occupant density), while EC was associated with distance to the nearest designated truck route, and NO2 was associated with both traffic density within 50 m of the home and gas stove usage. Our constrained factor analysis of indoor concentrations helped to separate outdoor-dominated factors from indoor-dominated factors, though some factors appeared to be influenced by both indoor and outdoor sources. Subsequent factor analyses of the indoor-attributable fractions from indoor-outdoor regression models provided generally consistent interpretations of indoor-dominated factors. The use of regression models on indoor factors demonstrated the limited predictive power of questionnaire data related to indoor sources, but reinforced the viability of modeling indoor concentrations of pollutants of ambient origin. In spite of the relatively weak predictive power of some of the indoor-concentration regression models, our epidemiologic simulations illustrated that exposure models with fairly modest R2 values (in the range of 0.3 through 0.4, corresponding with the regression models for PM2.5 and NO2) yielded substantial improvements in epidemiologic study performance relative to the use of exposure proxies that could be applied in the absence of validation studies. In spite of limitations related to sample size and available covariate data, our study demonstrated significant outdoor spatial variability within an urban area in NO2 and in several constituents of airborne particles. LUR techniques combined with constrained factor analysis helped to disentangle the contributions to temporal variability of local, long-range transport, and other sources, ultimately allowing exposures from defined source categories to be investigated in epidemiologic studies. For the indoor residential environment, we demonstrated substantial variability in indoor-outdoor relationships among particle constituents; then, using information from public databases and focused questionnaire data, we were able to predict indoor concentrations for a subset of key pollutants. Constrained factor analysis methods applied to the indoor environment helped to separate indoor sources from outdoor sources. The corresponding indoor regression models had limited predictive power, reinforcing the complexity of characterizing the indoor environment when only limited information about key predictors is available. This finding also underscores the likelihood that these regression models might characterize indoor concentrations of pollutants with ambient origins better than they can the indoor concentrations from all sources. Our findings provide direction for future studies characterizing indoor exposure sources and patterns, and our epidemiologic simulation reinforced the importance of reducing measurement error in a context where many traffic-related air pollutants are influenced by both indoor and outdoor sources. The combination of analytical techniques used in our study could ultimately allow for more refined exposure characterization and evaluation of the relative contributions of various sources to health outcomes in epidemiologic studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Air Pollutants / adverse effects
  • Air Pollutants / analysis
  • Air Pollution / adverse effects
  • Air Pollution / analysis*
  • Air Pollution / statistics & numerical data
  • Air Pollution, Indoor / adverse effects
  • Air Pollution, Indoor / analysis*
  • Air Pollution, Indoor / statistics & numerical data
  • Boston
  • Environmental Monitoring / methods
  • Factor Analysis, Statistical
  • Geographic Information Systems
  • Humans
  • Models, Statistical
  • Prospective Studies
  • Regression Analysis
  • Urban Health
  • Vehicle Emissions / analysis


  • Air Pollutants
  • Vehicle Emissions