View the runnable example on GitHub
OpenVINO Asynchronous Inference using Nano API#
You can use async_predict
method in OpenVINOModel
class in Nano to do asynchronous inference on an OpenVINO model. It only takes a few lines.
To run asynchronous inference on OpenVINO model with Nano, the following dependencies need to be installed first:
[ ]:
# for BigDL-Nano
!pip install --pre --upgrade bigdl-nano # install the nightly-built version
!source bigdl-nano-init
# for OpenVINO
!pip install openvino-dev
📝 Note
We recommend to run the commands above, especially
source bigdl-nano-init
before jupyter kernel is started, or some of the optimizations may not take effect.
Let’s take a resnet18-xnor-binary-onnx-0001 model pretrained on ImageNet dataset from the Open Model Zoo as an example. First, we download the model using omz_downloader:
[ ]:
!omz_downloader --name resnet18-xnor-binary-onnx-0001 -o ./model
First, load the model using OpenVINOModel
class.
[1]:
from bigdl.nano.openvino import OpenVINOModel
ov_model = OpenVINOModel("model/intel/resnet18-xnor-binary-onnx-0001/FP16-INT1/resnet18-xnor-binary-onnx-0001.xml")
To run asynchronous inference on OpenVINO model, the only change you need to make is to prepare a list of input data and call ov_model.async_predict(input_data, num_requests)
:
[ ]:
import numpy as np
input_data = [np.random.randn(1, 3, 224, 224) for i in range(5)]
async_results = ov_model.async_predict(input_data=input_data, num_requests=5)
for res in async_results:
predictions = res.argmax(axis=1)
print(predictions)
📝 Note
async_predict
accepts multiple groups of input data in a list, and each group of data will be inferenced using an asynchronous infer request, and a list containing the result of each infer request will be retured. If you have multiple groups of input data to inference,async_predict
will achieve better performance than sync inference usingov_model(x)
.You can specify the number of asynchronous infer request in
num_requests
, ifnum_requests
is set to 0, the value will be set automatically to the optimal number.In the code above, we have 5 groups of input data and create 5 asynchronous infer requests. When
async_predict
is called, each asynchronous infer request will run inference in a parallel pipeline.
📚 Related Readings