The Autism Diagnostic Observation Schedule (ADOS) is a first-choice diagnostic tool in autism spectrum disorder (ASD). Excellent interpersonal objectivity (interrater reliability) has been demonstrated for the ADOS under optimal conditions, i.e., within groups of highly trained "research reliable" examiners in research setting. We investigated the spontaneous interrater reliability among clinically trained ADOS users across multiple sites in clinical routine. Forty videotaped administrations of the ADOS modules 1-4 were rated by five different raters each from a pool of in total 15 raters affiliated to 13 different clinical sites. G(q,k) coefficients (analogous to intraclass correlations), kappas (ĸ) and percent agreement (PA) were calculated. The median interrater reliability for items across the four modules was G(q,k) = .74-.83, with the single ADOS items ranging from .23 to .94. G(q,k) for total scores was .85-.92. For diagnostic classification (ASD/non-spectrum), PA was 64-82 % and Fleiss' ĸ .19-.55. Objectivity was lower for pervasive developmental disorder not otherwise specified and non-spectrum diagnoses as compared to autism. Interrater reliabilities of the ADOS items and domain totals among clinical users across multiple sites were in the same range as previously reported for research reliable users, while the one for diagnostic classification was lower. Differences in sample characteristics, rater skills and statistics compared with previous studies are discussed. Findings endorse the objectivity of the ADOS in naturalistic clinical settings, but also pinpoint its limitations and the need and value of adequate and continuous rater training.
Keywords: Autism spectrum disorder; Diagnostic instrument; Interrater reliability.