Generate confidence interval for prediction#
Introduction#
In the inferencing process, sometimes user want an interval estimation for prediction instead of a point estimation, as interval estimation can provide more information to guide subsequent behaviors. One way to do this is confidence interval.
A confidence interval is the mean of your estimate plus and minus the variation in that estimate. In time series area, we adopt Monte Carlo dropout to calculate confidence interval with a reference to this paper.
Now, generating confidence interval for prediction is easy in Chronos, that is directly calling predict_interval
. In this guidance, we demonstrate how to generate confidence interval for prediction of forecaster in detail.
We will take TCNForecaster
and nyc_taxi dataset as an example in this guide.
Setup#
Before we begin, we need to install chronos if it isn’t already available, we choose to use pytorch as deep learning backend.
[ ]:
!pip install --pre --upgrade bigdl-chronos[pytorch]
# uninstall torchtext to avoid version conflict
!pip uninstall -y torchtext
Forecaster preparation#
Before the inferencing process, a forecaster should be created and trained. The training process is introduced in the previous guidance Train forcaster on single node in detail, therefore we directly create and train a TCNForecaster based on the nyc taxi dataset.
[ ]:
# get data for training, validation, and testing
train_data, val_data, test_data = get_data()
# get a trained forecaster
forecaster = get_trained_forecaster(train_data)
Obtain confidence interval#
When a trained forecaster is ready and forecaster is a non-distributed version, we provide with predict_interval
method to obtain confidence interval. Just pass data you want to predict (test data in most cases) and corresponding validation data (which will be used to calculate data bias).
📝Note
validation_data
is only required when callingpredict_interval
for the first time.
The predict_interval method supports data in following formats:
numpy ndarray (recommended)
pytorch dataloader
bigdl.chronos.data.TSDataset
And there are batch_size
and repetition_times
parameters you may want to change. If not familiar with manual hyperparameters tuning, just leave batch_size to the default value. repetition_times
represents repeating how many times to calculate model uncertainty based on MC Dropout. The larger the value, the more accurate the calculation, but also the slower.
[ ]:
# obtain prediction and variation by predict_interval
yhat, std = forecaster.predict_interval(data=test_data,
validation_data=val_data)
# obtain the upper bound and lower bound of interval according yhat and std
z_95 = 1.96 # for 95% confidence, check other quantile value of a standard Normal for other quantile
yhat_upper, yhat_lower = yhat + z_95 * std, yhat - z_95 *std