Pregnane X receptor (PXR) is a ligand-activated nuclear factor that upregulates the expression of proteins involved in the detoxification and clearance of xenobiotics, primarily cytochrome P450 3A4 (CYP3A4). Structure-activity relationship (SAR) analysis of PXR agonists is useful for avoiding unwanted pharmacokinetics due to drug-drug interactions. To perform large-scale ligand-based SAR modeling, we systematically collected information on chemical-PXR interactions from the PubMed database by using the text mining system we developed, and merged it with screening data registered in the PubChem BioAssay database and other published data. Curation of the data resulted in 270 human PXR agonists and 248 non-agonists. After the entire data set was divided into training and testing data sets, the training data set comprising 415 data entries (217 positive and 198 negative instances) was analyzed by a recursive partitioning method. The classification tree optimized by a cross-validation pruning algorithm gave an accuracy of 79.0%, and, for the external testing data set, could correctly classify PXR agonists and non-agonists at an accuracy of 70.9%. Descriptors chosen as splitting rules in the classification tree were generally associated with electronic properties of molecules, suggesting they had an important role in the modes of interaction.