Purpose: The variances and biases inherent in quantifying PET tracer uptake from instrumentation factors are needed to ascertain the significance of any measured differences such as in quantifying response to therapy. The authors studied the repeatability and reproducibility of serial PET measures of activity as a function of object size, acquisition, reconstruction, and analysis method on one scanner and at three PET centers using a single protocol with long half-life phantoms.
Methods: The authors assessed standard deviations (SDs) and mean biases of consecutive measures of PET activity concentrations in a uniform phantom and a NEMA NU-2 image quality (IQ) phantom filled with 9 months half-life 68Ge in an epoxy matrix. Activity measurements were normalized by dividing by a common decay corrected true value and reported as recovery coefficients (RCs). Each experimental set consisted of 20 consecutive PET scans of either a stationary phantom to evaluate repeatability or a repositioned phantom to assess reproducibility. One site conducted a comprehensive series of repeatability and reproducibility experiments, while two other sites repeated the reproducibility experiments using the same IQ phantom. An equation was derived to estimate the SD of a new PET measure from a known SD based on the ratios of available coincident counts between the two PET measures.
Results: For stationary uniform phantom scans, the SDs of maximum RCs were three to five times less than predicted for uncorrelated pixels within circular regions of interest (ROIs) with diameters ranging from 1 to 15 cm. For stationary IQ phantom scans from 1 cm diameter ROIs, the average SDs of mean and maximum RCs ranged from 1.4% to 8.0%, depending on the methods of acquisition and reconstruction (coefficients of variation range 2.5% to 9.8%). Similar SDs were observed for both analytic and iterative reconstruction methods (p > or = 0.08). SDs of RCs for 2D acquisitions were significantly higher than for 3D acquisitions (p < or =s 0.008) for same acquisition and processing parameters. SDs of maximum RCs were larger than corresponding mean values for stationary IQ phantom scans ( < or = 0.02), although the magnitude of difference is reduced due to noise correlations in the image. Increased smoothing decreased SDs ( < or =s 0.045) and decreased maximum and mean RCs (p < or = 0.02). Reproducibility of GE DSTE, Philips Gemini TF, and Siemens Biograph Hi-REZ PET/CT scans of the same IQ phantom, with similar acquisition, reconstruction, and repositioning among 20 scans, were, in general, similar (mean and maximum RC SD range 2.5% to 4.8%).
Conclusions: Short-term scanner variability is low compared to other sources of error. There are tradeoffs in noise and bias depending on acquisition, processing, and analysis methods. The SD of a new PET measure can be estimated from a known SD if the ratios of available coincident counts between the two PET scanner acquisitions are known and both employ the same ROI definition. Results suggest it is feasible to use PET/CTs from different vendors and sites in clinical trials if they are properly cross-calibrated.