Speed up inference of forecaster through OpenVINO¶
Introduction¶
In the inferencing process, it is desirable to speed up. One way to do this is to utilize some accelerators, such as OpenVINO.
Actually, utilizing OpenVINO to accelerate is easy in Chronos, that is directly calling predict_with_openvino
(optionally build_openvino
). In this guidance, we demonstrate how to speed up inference of forecaster through OpenVINO in detail.
We will take TCNForecaster
and nyc_taxi dataset as an example in this guide.
Setup¶
Before we begin, we need to install chronos if it isn’t already available, we choose to use pytorch as deep learning backend.
[ ]:
!pip install --pre --upgrade bigdl-chronos[pytorch]
# install OpenVINO
!pip install openvino-dev
# fix conflict with google colab
!pip uninstall -y torchtext
!pip install numpy==1.21
exit()
📝Note
Although Chronos supports inferencing on a cluster, the method to speed up can only be used when forecaster is a non-distributed version.
Only pytorch backend deep learning forecasters support openvino acceleration.
Forecaster preparation¶
Before the inferencing process, a forecaster should be created and trained. The training process is introduced in the previous guidance Train forcaster on single node in detail, therefore we directly create and train a TCNForecaster
based on the nyc taxi dataset.
Speeding up inference¶
When a trained forecaster is ready and forecaster is a non-distributed version, we provide with predict_with_openvino
method to speed up inference. The method can be directly called without calling build_openvino
and forecaster will automatically build an openvino session with default settings.
📝Note
build_openvino
is recommended to use when you want to alleviate the cold start problem whenpredict_with_openvino
is called for the first time.Please refer to API documentation for more information on
build_openvino
.
The predict_with_openvino
method currently only supports data in numpy ndarray format and the parameter batch_size
can be changed. If not familiar with manual hyperparameters tuning, just leave batch_size
to the default value.
[ ]:
# get data for training and testing
train_data, test_data = get_data()
# get a trained forecaster
forecaster = get_trained_forecaster(train_data)
[ ]:
# speed up inference through OpenVINO
for x, y in test_data:
yhat = forecaster.predict_with_openvino(x.numpy()) # predict
Let’s see the acceleration performance of predict_with_openvino
.
The predict latency of without accelerator and with OpenVINO are given below. The result “p50” means latency sorted to 50% in multiple predictions and the acceleration performance is significant.
[ ]:
from bigdl.chronos.metric.forecast_metrics import Evaluator
x = next(iter(test_data))[0]
def func_original():
forecaster.predict(x.numpy()) # without accelerator
def func_openvino():
forecaster.predict_with_openvino(x.numpy()) # with OpenVINO
print("original predict runtime (ms):", Evaluator.get_latency(func_original))
print("pridict runtime with OpenVINO (ms):", Evaluator.get_latency(func_openvino))