Recent advances in unbiased metagenomic next-generation sequencing (mNGS) enable simultaneous examination of microbial and host genetic material. We developed a multimodal machine learning-based diagnostic approach to differentiate lung cancer and pulmonary infections by analyzing 402 bronchoalveolar lavage fluid (BALF) mNGS datasets, including lung cancer (n = 123), bacterial infections (n = 114), fungal infections (n = 79), and pulmonary tuberculosis (n = 86). The training cohort revealed differences in microbial profiles, bacteriophage abundance, host gene and transposable element expression, immune cell composition, and tumor fraction derived from copy number variation (CNV). The integrated model (Model VI) achieved an AUC of 0.937 (95% CI, 0.910-0.964) in the training cohort and 0.847 (95% CI, 0.776-0.918) in the test cohort. A rule-in/rule-out strategy further improved accuracy in differentiating lung cancer from tuberculosis (accuracy = 0.896), fungal (accuracy = 0.915), and bacterial (accuracy = 0.907) infections. These findings highlight the potential of mNGS-based multimodal analysis as a cost-effective tool for early and accurate differential diagnosis.
© 2025. The Author(s).