Past studies describe numerous endophenotypes associated with schizophrenia (SZ), but many endophenotypes may overlap in information they provide, and few studies have investigated the utility of a multivariate index to improve discrimination between SZ and healthy community comparison subjects (CCS). We investigated 16 endophenotypes from the first phase of the Consortium on the Genetics of Schizophrenia, a large, multi-site family study, to determine whether a subset could distinguish SZ probands and CCS just as well as using all 16. Participants included 345 SZ probands and 517 CCS with a valid measure for at least one endophenotype. We used both logistic regression and random forest models to choose a subset of endophenotypes, adjusting for age, gender, smoking status, site, parent education, and the reading subtest of the Wide Range Achievement Test. As a sensitivity analysis, we re-fit models using multiple imputations to determine the effect of missing values. We identified four important endophenotypes: antisaccade, Continuous Performance Test-Identical Pairs 3-digit version, California Verbal Learning Test, and emotion identification. The logistic regression model that used just these four endophenotypes produced essentially the same results as the model that used all 16 (84% vs. 85% accuracy). While a subset of endophenotypes cannot replace clinical diagnosis nor encompass the complexity of the disease, it can aid in the design of future endophenotypic and genetic studies by reducing study cost and subject burden, simplifying sample enrichment, and improving the statistical power of locating those genetic regions associated with schizophrenia that may be the easiest to identify initially.
Keywords: Accuracy; Endophenotype; Logistic regression; Multiple imputation; ROC curve; Random forest; Schizophrenia; Sensitivity; Specificity.
Published by Elsevier B.V.