Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 1;22(9):2452-76.
doi: 10.1162/NECO_a_00007.

Bayesian online learning of the hazard rate in change-point problems

Affiliations

Bayesian online learning of the hazard rate in change-point problems

Robert C Wilson et al. Neural Comput. .

Abstract

Change-point models are generative models of time-varying data in which the underlying generative parameters undergo discontinuous changes at different points in time known as change points. Change-points often represent important events in the underlying processes, like a change in brain state reflected in EEG data or a change in the value of a company reflected in its stock price. However, change-points can be difficult to identify in noisy data streams. Previous attempts to identify change-points online using Bayesian inference relied on specifying in advance the rate at which they occur, called the hazard rate (h). This approach leads to predictions that can depend strongly on the choice of h and is unable to deal optimally with systems in which h is not constant in time. In this letter, we overcome these limitations by developing a hierarchical extension to earlier models. This approach allows h itself to be inferred from the data, which in turn helps to identify when change-points occur. We show that our approach can effectively identify change-points in both toy and real data sets with complex hazard rates and how it can be used as an ideal-observer model for human and animal behavior when faced with rapidly changing inputs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cartoon illustrating a simple change-point problem. A. Data points (circles) are generated by adding random noise to an underlying mean (dashed line) that undergoes two change-points, at times 5 and 10. B. The number of points since the last change-point, or run-length, is plotted as a function of time for the same example. The run-length increases by one when there is no change-point (black line) and decreases to zero when there is one (dashed black line). C Schematic of the message-passing updating rule from (Adams and MacKay, 2007). Starting from time 1, all of the weight is on the node at run-length 0. At time 2 this node then sends messages to nodes at run-lengths 1 and 0. From there each node sends two messages, one increasing the run-length by one and the other back to the node at rt = 0. Using these input messages, each node updates its weight, p(rt|x1:t).
Figure 2
Figure 2
Effect of changing the hazard rate on the mean of the predictive distribution generated by the algorithm in (Adams and MacKay, 2007) for a single data set. Each panel shows a different setting of the hazard rate h, as indicated. The top half of each panel shows the data (gray dots) along with the model’s predicted mean in black. The bottom half of each panel shows the logarithm of the run-length distribution, log p(rt|x1:t), with darker shades corresponding to higher probabilities. Clearly the predictions are heavily influenced by the choice of h and, without knowing the actual change-point locations, it is not obvious to tell which one is better matched to the data.
Figure 3
Figure 3
Inference of a constant hazard rate for a toy problem. A. The raw data (circles) and the predicted mean (solid line) plotted versus time. The actual change-point locations from the generative process are shown by the gray vertical lines. B. The marginal run-length distribution p(rt|x1:t) = Σat p(rt, at|x1:t) versus time. C. The marginal distribution over the number of change-points, p(at|x1:t) = Σrt p(rt, at|x1:t), versus time. D. The maximum likelihood on-line estimate of the hazard rate (solid black line) quickly converges to the actual hazard rate (dashed black line).
Figure 4
Figure 4
Illustration of the generative model corresponding to the change-point hierarchy. See text for details.
Figure 5
Figure 5
Effect of pruning on the two-level case for Bernoulli, Gaussian, and Laplacian data (columns). AC, Generative parameters (dashed line in A), data (gray circles in B and C), and inferences from a pruned (solid black lines) and unpruned (thick gray lines) models versus time. DF, Number of nodes versus time for the pruned (black) and unpruned (gray) cases. GI, Unpruned run-length distribution versus time. J–L, Pruned run-length distributions versus time. Horizontal lines indicate bin boundaries used for pruning.
Figure 6
Figure 6
Effect of pruning on a three-level hierarchy example. A The data (circles) are sampled from a Gaussian distribution whose mean changes every 2 time steps for t ≤ 10 and then remains constant afterwards. The predictive mean is shown for the unpruned (dashed black) and pruned (gray) model. B The model’s estimates of h (gray line unpruned, black line pruned) compared with the generative h (dashed black line). C The number of nodes for the unpruned (gray) and pruned (black) cases over time on a logarithmic scale. D Marginal low-level run-length distribution, p(rt(2)x1:t), in the unpruned case. This is to be compared with the pruned version of the same distribution in panel G. E and H Marginal high-level run-length distribution, p(rt(1)x1:t), in the unpruned and pruned cases respectively. F and I Marginal change-point count distribution, p(at|x1:t), in the unpruned and pruned cases.
Figure 7
Figure 7
Bernoulli data example, similar to that in (Behrens et al., 2007). A. Binary data representing the presence (vertical black line) or absence (vertical white line) of reward at a given time. B. The probability of reward delivery, ρt, (gray dashed line) changes over time according to a change-point process and is well tracked by the algorithm (black line). C The actual hazard rate over time (dashed gray line) compared with the hazard rate inferred by the model (black line). D The low level run-length distribution, p(r(2)|x1:t) computed by the model as a function of time. E The change-point count distribution, p(at|x1:t), as a function of time. F The high level run-length distribution, p(r(1)|x1:t), as a function of time shows that the model has recognized the high level change-point at t = 200.
Figure 8
Figure 8
Gaussian example. A The data over time (thin gray line) are sampled from a Gaussian distribution whose mean and variance undergo change-points. The black line indicates the model’s estimate of the predictive mean. B The model’s estimate of the hazard rate (black line) compared with the true hazard rate of the generative process (dashed gray line). C The low level run-length distribution, p(rt(2)x1:t) over time. D The change-point count distribution, p(atap|x1:t) over time. E The high level run-length distribution, p(rt(1)x1:t).
Figure 9
Figure 9
Change-point analysis of daily GM stock returns, 1972–2009. A. Daily log return (Rt, (gray line) and the estimated scale factor (λt, black line) from equation 44 plotted as a function of time. B. The model’s estimate of the hazard rate, ht(1), in units of change-points per year, versus time. C. Low level run-length distribution, p(rt(2)x1:t), versus time (note that run-length is measured in years). Several change-points are evident, corresponding to abrupt changes in the variance of the return. D. Distribution over the number of change-points, p(ata0|x1:t), versus time. E. High level run-length distribution, p(rt(1)x1:t), versus time. The model identified one high level change-points around 2001, roughly at the time of Richard Wagoner’s appointment as CEO. This event was followed by a higher frequency of inferred change-points and therefore an increase in the estimated hazard rate.

Similar articles

Cited by

References

    1. Adams RP, MacKay DJ. Technical report. University of Cambridge; Cambridge, UK: 2007. Bayesian online changepoint detection.
    1. Aroian LA, Levene H. The effctiveness of quality control charts. Journal of the American Statistical Association. 1950;45(252):520–529.
    1. Averbeck BB, Lee D. Prefrontal neural correlates of memory for sequences. Journal of Neuroscience. 2007;27(9):2204–2211. - PMC - PubMed
    1. Barlow J, Creutzfeldt O, Michael D, Houchin J, Epelbaum H. Automatic adaptive segmentation of clinical eegs. Electroencephalography and Clinical Neurophysiology. 1981;51:512–525. - PubMed
    1. Barry D, Hartigan JA. A bayesian analysis for change point problems. Journal of the American Statistical Association. 1993;88(421):309–319.