This paper describes a framework for establishing a reference airway tree segmentation, which was used to quantitatively evaluate fifteen different airway tree extraction algorithms in a standardized manner. Because of the sheer difficulty involved in manually constructing a complete reference standard from scratch, we propose to construct the reference using results from all algorithms that are to be evaluated. We start by subdividing each segmented airway tree into its individual branch segments. Each branch segment is then visually scored by trained observers to determine whether or not it is a correctly segmented part of the airway tree. Finally, the reference airway trees are constructed by taking the union of all correctly extracted branch segments. Fifteen airway tree extraction algorithms from different research groups are evaluated on a diverse set of twenty chest computed tomography (CT) scans of subjects ranging from healthy volunteers to patients with severe pathologies, scanned at different sites, with different CT scanner brands, models, and scanning protocols. Three performance measures covering different aspects of segmentation quality were computed for all participating algorithms. Results from the evaluation showed that no single algorithm could extract more than an average of 74% of the total length of all branches in the reference standard, indicating substantial differences between the algorithms. A fusion scheme that obtained superior results is presented, demonstrating that there is complementary information provided by the different algorithms and there is still room for further improvements in airway segmentation algorithms.