Postoperative cognitive function (POCD) has been subject to extensive research. In the literature, large differences are apparent in methodology such as the test batteries, the interval between sessions, the endpoints to be analysed, statistical methods, and how neuropsychological deficits are defined. Traditionally, intelligence tests or tests developed for clinical neuropsychology have been used. The tests for detecting POCD should be based on well-described sensitivity and suitability in relation to surgical patients. In tests using scores, floor/ceiling effects may compromise the evaluation if the tests are either too easy or to difficult. Uncontrolled testing facilities and change of test personnel may affect the test performance. Practice effects are pronounced in neuropsychological tests but have generally been ignored. The use of a suitable normative population is essential to allow correction for practice effects and variability between sessions. Missing follow-up may severely compromise valid conclusions since subjects unable or unwilling to be examined are particularly prone to suffer from POCD. In the statistical analysis of the test results, the evaluation should be based on differences between pre- and postoperative performance. Parametric statistical tests are not relevant unless the appropriate Gaussian distributions are present, perhaps after transformation of data. The definition of cognitive dysfunction should be restrictive and the criteria should be fulfilled in only a small proportion of volunteers. In the literature, these requirements often have not been fulfilled. This precludes a reasonable estimation of the incidence of POCD and the conclusions of comparative studies should be interpreted with great caution. In this review article, we present a number of recommendations for the design and execution of studies within this area. In addition, the critical reader may use these recommendations in the evaluation of the literature.