The purpose of this systematic review was to determine the quality of the research and to assess the reliability of different types of physical examination procedures used in the assessment of patients with non-specific low back pain. A search of electronic databases (MEDLINE, PEDro, AMED, EMBASE, Cochrane, and CINAHL) up to August 2005 identified 48 relevant studies which were analysed for quality and reliability. Pre-established criteria were used to judge the quality of the studies and satisfactory reliability, and conclusions emphasised high quality studies (> or = 60% methods score). The mean quality score of the studies was 52% (range 0 to 88%), indicating weak to moderate methodology. Based on the upper threshold used (kappa/ICC > 0.85) most procedures demonstrated either conflicting evidence or moderate to strong evidence of low reliability. When the lower threshold was used (kappa/ICC > 0.70) evidence about pain response to repeated movements changed from contradictory to moderate evidence for high reliability. Most procedures commonly used by clinicians in the examination of patients with back pain demonstrate low reliability.