The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a community-wide blind test experiment to reveal the best accomplishments of structure modeling. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differences between close models is difficult because of the lack of uncertainty estimations for this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to estimate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic resonance are deposited as ensembles of alternative conformers that reflect the structural flexibility, whereas standard X-ray refinement produces the static structure averaged over time and space for the dynamic ensembles. To recapitulate the structural heterogeneous ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets to generate structural ensembles for our GDT_TS uncertainty analysis. Using those generated ensembles, our study demonstrates that the time-averaged refinements produced structure ensembles with better agreement with the experimental datasets than the averaged X-ray structures with B-factors. The uncertainty of the GDT_TS scores, quantified by their standard deviations (SDs), increases for scores lower than 50 and 70, with maximum SDs of 0.3 and 1.23 for X-ray and NMR structures, respectively. We also applied our procedure to the high accuracy version of GDT-based score and produced similar results with slightly higher SDs. To facilitate score comparisons by the community, we developed a user-friendly web server that produces structure ensembles for NMR and X-ray structures and is accessible at http://prodata.swmed.edu/SEnCS. Our work helps to identify the significance of GDT_TS score differences, as well as to provide structure ensembles for estimating SDs of any scores.