This paper presents a new approach for evaluating predictions of oxygen saturation levels in blood ( SpO2). A performance metric based on a threshold is proposed to evaluate SpO2 predictions based on whether or not they are able to capture critical desaturations in the SpO2 time series of patients. We use linear auto-regressive models built using historical SpO2 data to predict critical desaturation events with the proposed metric. In 20 s prediction intervals, 88%-94% of the critical events were captured with positive predictive values (PPVs) between 90% and 99%. Increasing the prediction horizon to 60 s, 46%-71% of the critical events were detected with PPVs between 81% and 97%. In both prediction horizons, more than 97% of the non-critical events were correctly classified. The overall classification capabilities for the developed predictive models were also investigated. The area under ROC curves for 60 s predictions from the developed models are between 0.86 and 0.98. Furthermore, we investigate the effect of including pulse rate (PR) dynamics in the models and predictions. We show no improvement in the percentage of the predicted critical desaturations if PR dynamics are incorporated into the SpO2 predictive models (p-value = 0.814). We also show that including the PR dynamics does not improve the earliest time at which critical SpO2 levels are predicted (p-value = 0.986). Our results indicate oxygen in blood is an effective input to the PR rather than vice versa. We demonstrate that the combination of predictive models with frequent pulse oximetry measurements can be used as a warning of critical oxygen desaturations that may have adverse effects on the health of patients.