Observer variability in interpreting medical tests and in making diagnoses influences both clinical practice and research. Uniform classification of epileptic seizures is especially difficult. Although the ILAE classification scheme for seizures has been available for many years, the reliability of this system has not been previously assessed. Verbatim descriptions of seizure manifestations were transcribed from medical records as part of a large, population-based prevalence study of childhood epilepsy conducted in two countries in central Oklahoma. One senior neurologist and three neurology residents reviewed these descriptions independently and classified them by seizure type based on the ILAE system. Unweighted and weighted kappa statistics were used to assess the level of agreement between the study neurologist and each resident. The overall agreement between observer pairs in classifying seizure types based on all available descriptions was relatively poor (kappa = 0.24-0.38). Some improvement was evident when unclassified seizures were excluded, and comparisons were restricted to those based on descriptions with some degree of detail (kappa = 0.34-0.51). When specific types of seizures were classified, agreement was fair to excellent for most types (kappa = 0.45-0.90), with the exceptions of atypical absence (kappa = 0.11-0.28), partial seizures with secondary generalization (kappa = 0.26-0.40), and generalized motor seizures (kappa = 0.29-0.32). Sources of observer variability in addition to the classification scheme are considered. Use of specific criteria for the categorization of symptoms might improve the reliability of seizure classification.