Identifying genetic variation predictive of important phenotypes, including disease susceptibility, drug efficacy, and adverse events, is a challenging task, and theory and computer science work is being carried out in an attempt to tackle this issue. For many important diseases, such as diabetes, schizophrenia, and depression, the etiology is complex; either the disease is a result of several multiple mechanisms or is caused by an interaction among multiple genes or gene-environment interactions, or both. There is a need for statistical methods to deal with the large, complex data sets that will be used to disentangle these diseases. Each putative genetic polymorphism can be tested for association sequentially. The most difficult problem, however, is the identification of combinations of polymorphisms or genetic markers with increased predictive characteristics. Data from clinical trials, where patients with a particular disease are treated with certain drugs, can be retrospectively assembled using a case-control design. Such data will typically include treatment assignment, demographics, medical history, and genotypes for a large number of genetic markers. The number of variables in such data is expected to be much larger than the number of subjects. This report focuses on some of the methods being employed to deal with this complex data and covers, in some detail, a data-mining method--recursive partitioning--to analyze such data. The methods are demonstrated using a complex simulated data set, as there are few available public data sets. This explication of recursive partitioning should provide researchers with a better idea of the current available analysis techniques, in order to allow them to plan their experiments more effectively.