**Title**: Predicting the Number of Future Events

**Authors & Year**: Qinglong Tian, Fanqi Meng, Daniel J. Nordman, and William Q. Meeker (2021)

**Journal**: Journal of the American Statistical Association [DOI: 10.1080/01621459.2020.1850461]

**Review Prepared by** David Han

*Statistical Prediction for Quality Engineering*

For quality assessments in reliability and industrial engineering, it is often necessary to predict the number of future events (*e.g.*, system or component failures). Examples include the prediction of warranty returns and the prediction of future product failures that could lead to serious property damages and/or human casualties. Business decisions such as a product recall are based on such predictions. These applications also require a prediction interval to quantify prediction uncertainty arising from the combination of process variability and parameter uncertainty. In this research, the authors studied the within-sample prediction of the number of future events, given the observed data. The term ** within-sample** prediction was used to distinguish from the more widely known

**prediction. For new-sample prediction, past data are used to produce a prediction interval for the lifetime of a single unit from a new independent sample. For within-sample prediction, however, the sample is unchanged. The future quantity that researchers wish to predict (**

*new-sample**e.g.*, the number of future failures) relates to the same sample that provided the original data.

*How to Construct Prediction Interval?*

To compute a prediction interval, one needs the conditional probability distribution for a random quantity of interest, given the observed data. This conditional distribution is indexed by the parameters, which are usually unknown in practice. This requires estimation of the parameters from the observed data. The popular **plug-in** (**PL**) method (*a.k.a.* the naive or estimative method) replaces the unknown parameter by a consistent estimator, which converges to the true parameter in probability as the sample size grows. Although the PL method has been criticized for ignoring the estimation uncertainty, many researchers argued that the coverage probability of the PL method has a good accuracy under certain conditions. The authors of the current research, however, demonstrated that the PL method fails to provide an asymptotically correct interval for the within-sample prediction. That is, for large amounts of data, the coverage probability always fails to converge to the nominal confidence level. As a solution, they suggested the **calibration-bootstrap** (**CB**) method and established its asymptotic correctness. The basic idea of a bootstrap method is that inference based on sample data can be modeled by resampling the sample data and performing inference from the resampled data. The authors also presented two alternative methods based on a predictive distribution. The first is a general method using **direct parametric bootstrap** (**DB**) samples. Using bootstrap samples, it establishes the future failure probability distribution. And the predictive distribution is obtained by averaging over this distribution. The second method, inspired by **generalized pivotal quantities** (**GPQ**), does so based on a special probability function known as the log-location-scale distribution family, which is popular in reliability engineering.

*Which One to Use in Practice?*

Through simulations, the performance difference of these four methods was investigated. Figure 1 compares the coverage probabilities of the prediction bounds based on a single-cohort data generated from Weibull distribution with 20% probability of failure in a future time interval. Pf1 is the probability of failure before an experiment terminates, representing the amount of information that can be used to construct prediction intervals. Each column is for different prediction bound (*i.e.*, one-sided interval, either lower or upper) while the horizontal dashed line represents the nominal level, 90% or 95%. As illustrated, the PL method fails to attain asymptotically correct coverage probability. On the other hand, the DB and GPQ methods are close to each other, and they tend to outperform the CB method since their coverage probabilities are much closer to the nominal level, even though all three methods are asymptotically valid. With some additional benefits such as the ease of implementation and computational stability, the authors recommend predictive distribution methods, especially the GPQ method for general applications involving within-sample prediction.

**Figure 1. **Coverage probabilities versus expected number of events based on the four methods

*Now & Onward*

The present results clearly warn us about the deficiency of the PL method for computing a prediction interval although it has been well known and widely used for its simplicity. Moving forward, it is advised that ** statistical practitioners and quality engineers should use the DB or GPQ method for within-sample predictions**. The research does not stop here. In many applications, test units are exposed to various operating or environmental conditions, resulting in different probabilities of time-to-failure. Prediction intervals that utilize covariate information, such as temperature and humidity, would be useful for manufacturers and regulators in making informed decisions (

*e.g.*, a potential product recall). Moreover, there could be seasonality effects in time-to-failure processes, which would impact within-sample predictions. Future research would develop predictive inferential methods extending the results discussed in this paper by incorporating constant or time-varying covariates.