Background: Recent advances in high-throughput molecular epidemiology are transforming the analysis of viral infections.
Methods: Human immunodeficiency virus (HIV)-1 pol sequences from a Northern Californian cohort (NCC) of 4553 antiretroviral-naive individuals sampled between 1998 and 2016 were analyzed together with 140 000 previously published global pol sequences. The HIV-TRAnsmission Cluster Engine (HIV-TRACE) was used to infer a transmission network comprising links between NCC and previously published sequences having a genetic distance ≤1.5%.
Results: Twenty-five percent of NCC sequences were included in 264 clusters linked to a published sequence, and approximately one third of these (8.0% of the total) were linked to 1 or more non-US sequences. The largest cluster, containing 512 NCC sequences (11.2% of the total), comprised the subtype B lineage that traced its origin to the earliest North American sequences. Approximately 5 percent of NCC sequences belonged to a non-B subtype, and these were more likely to cluster with a non-US sequence. Twenty-two NCC sequences belonged to 1 of 4 large clusters containing sequences from rapidly growing regional epidemics: CRF07_BC (East Asia), subtype A6 (former Soviet Union), a Japanese subtype B lineage, and an East/Southeast Asian CRF01_AE lineage. Bayesian phylogenetics suggested that most non-B sequences resulted from separate introductions but that local spread within the largest CRF01_AE cluster occurred twice.
Conclusions: The NCC contains national and international links to previously published sequences including many to the subtype B strain that originated in North America and several to rapidly growing Asian epidemics. Despite their rapid regional growth, the Asian epidemic strains demonstrated limited NCC spread.
Keywords: HIV-1; network analysis; pol sequence; transmission.