Generate confidence interval for prediction#

Introduction#

In the inferencing process, sometimes user want an interval estimation for prediction instead of a point estimation, as interval estimation can provide more information to guide subsequent behaviors. One way to do this is confidence interval.

A confidence interval is the mean of your estimate plus and minus the variation in that estimate. In time series area, we adopt Monte Carlo dropout to calculate confidence interval with a reference to this paper.

Now, generating confidence interval for prediction is easy in Chronos, that is directly calling predict_interval. In this guidance, we demonstrate how to generate confidence interval for prediction of forecaster in detail.

We will take TCNForecaster and nyc_taxi dataset as an example in this guide.

Setup#

Before we begin, we need to install chronos if it isn’t already available, we choose to use pytorch as deep learning backend.

[ ]:

!pip install --pre --upgrade bigdl-chronos[pytorch]
# uninstall torchtext to avoid version conflict
!pip uninstall -y torchtext

Forecaster preparation#

Before the inferencing process, a forecaster should be created and trained. The training process is introduced in the previous guidance Train forcaster on single node in detail, therefore we directly create and train a TCNForecaster based on the nyc taxi dataset.

[ ]:

# get data for training, validation, and testing
train_data, val_data, test_data = get_data()
# get a trained forecaster
forecaster = get_trained_forecaster(train_data)

Obtain confidence interval#

When a trained forecaster is ready and forecaster is a non-distributed version, we provide with predict_interval method to obtain confidence interval. Just pass data you want to predict (test data in most cases) and corresponding validation data (which will be used to calculate data bias).

📝Note

validation_data is only required when calling predict_interval for the first time.

The predict_interval method supports data in following formats:

numpy ndarray (recommended)
pytorch dataloader
bigdl.chronos.data.TSDataset

And there are batch_size and repetition_times parameters you may want to change. If not familiar with manual hyperparameters tuning, just leave batch_size to the default value. repetition_times represents repeating how many times to calculate model uncertainty based on MC Dropout. The larger the value, the more accurate the calculation, but also the slower.

[ ]:

# obtain prediction and variation by predict_interval
yhat, std = forecaster.predict_interval(data=test_data,
                                        validation_data=val_data)
# obtain the upper bound and lower bound of interval according yhat and std
z_95 = 1.96 # for 95% confidence, check other quantile value of a standard Normal for other quantile
yhat_upper, yhat_lower = yhat + z_95 * std, yhat - z_95 *std