View the runnable example on GitHub
Accelerate Inference on Intel GPUs Using OpenVINO#
You can use InferenceOptimizer.trace(..., accelerator='openvino', device='GPU')
to enable the OpenVINO acceleration for inference on Intel GPUs, both integrated and discrete ones. BigDL-Nano also supports quantization with OpenVINO accelerator on Intel GPUs by InferenceOptimizer.quantize(..., accelerator='openvino', device='GPU', precision='fp16'/'int8')
. It only takes a few lines.
Before starting this guide, below codes can help you search the available Intel GPU devices on you machine, and you can inference on any one of them.
[ ]:
from openvino.runtime import Core
core = Core()
print(core.available_devices)
The function returns a list of available devices:
output |
corresponding GPU device(s) |
---|---|
|
alias for |
|
enumeration of GPUs, |
|
specific tile in a multi-tile architecture, |
For more information around the device naming convention of openvino, you can refer to this page.
PyTorch example#
Let’s take a ResNet-18 model pretrained on ImageNet dataset as an example. Note that you don’t have to transfer the model to GPU and set it to evaluation mode since InferenceOptimizer
will handle these automatically.
[ ]:
from torchvision.models import resnet18
pt_model = resnet18(pretrained=True)
_, train_dataset, val_dataset = finetune_pet_dataset(pt_model)
The full definition of function finetune_pet_dataset
could be found in the runnable example.
To enable OpenVINO acceleration for your PyTorch inference pipeline on Intel GPUs, the only change you need to made is to import BigDL-Nano InferenceOptimizer
, and trace your PyTorch model to convert it into an OpenVINO accelerated module for inference, with specifying device='GPU'
.
📝 Note
By setting
device
to'GPU'
, inference will be conducted on the default Intel GPU device. You can change to other devices ('GPU.X'
/'GPU.X.Y'
) instead.
[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer
ov_model = InferenceOptimizer.trace(pt_model,
accelerator="openvino",
input_sample=torch.rand(1, 3, 224, 224),
device='GPU')
📝 Note
input_sample
is the parameter for OpenVINO accelerator to know the shape of the model input. So both the batch size and the specific values are not important toinput_sample
. If we want our test dataset consists of images with \(224 \times 224\) pixels, we could usetorch.rand(1, 3, 224, 224)
forinput_sample
here.Please refer to API documentation for more information on
InferenceOptimizer.trace
.
If you want to quantize your model by using OpenVINO Post-training Optimization Tools, you could call InferenceOptimizer.quantize
.
For FP16 quantization:
[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer
ov_model = InferenceOptimizer.quantize(pt_model,
accelerator='openvino',
input_sample=torch.rand(1, 3, 224, 224),
device='GPU',
precision='fp16')
For INT8 quantization:
[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer
ov_model = InferenceOptimizer.quantize(pt_model,
accelerator='openvino',
input_sample=torch.rand(1, 3, 224, 224),
device='GPU',
precision='int8',
calib_data=DataLoader(train_dataset, batch_size=32))
📝 Note
For INT8 quantization, we adopt the Post-training Optimization Tools provided by OpenVINO toolkit, which only supports static post-training quantization. So
calib_data
(calibration data) is always required whenaccelerator='openvino'
. Here batch size is not important as it intends to read 100 samples. And there could be no label in calibration data.Please refer to API documentation for more information on
InferenceOptimizer.quantize
.
You could then do the normal inference steps with the model optimized by OpenVINO:
[ ]:
with InferenceOptimizer.get_context(ov_model):
x = torch.rand(2, 3, 224, 224)
# use the optimized model here
y_hat = ov_model(x)
predictions = y_hat.argmax(dim=1)
print(predictions)
TensorFlow example#
Let’s take MobileNetV2 as an example. Note that you don’t have to transfer the model to GPU at this step since InferenceOptimizer
will handle this automatically.
[ ]:
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
import numpy as np
tf_model = MobileNetV2(weights=None, input_shape=[40, 40, 3], classes=10)
train_examples = np.random.random((100, 40, 40, 3))
train_labels = np.random.randint(0, 10, size=(100,))
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
To enable OpenVINO acceleration for your TensorFlow inference pipeline on Intel GPUs, the only change you need to made is to import BigDL-Nano InferenceOptimizer
, and trace your TensorFlow model to convert it into an OpenVINO accelerated module for inference, with specifying device='GPU'
.
📝 Note
By setting
device
to'GPU'
, inference will be conducted on the default Intel GPU device. You can change to other devices ('GPU.X'
/'GPU.X.Y'
) instead.
[ ]:
from bigdl.nano.tf.keras import InferenceOptimizer
ov_model = InferenceOptimizer.trace(tf_model,
accelerator="openvino",
device='GPU')
If you want to quantize your model by using OpenVINO Post-training Optimization Tools, you could call InferenceOptimizer.quantize
.
For FP16 quantization:
[ ]:
from bigdl.nano.tf.keras import InferenceOptimizer
ov_model = InferenceOptimizer.quantize(tf_model,
accelerator='openvino',
device='GPU',
precision='fp16')
For INT8 quantization:
[ ]:
from bigdl.nano.tf.keras import InferenceOptimizer
ov_model = InferenceOptimizer.quantize(tf_model,
accelerator='openvino',
device='GPU',
precision='int8',
x=train_dataset)
📝 Note
For INT8 quantization, we adopt the Post-training Optimization Tools provided by OpenVINO toolkit, which only supports static post-training quantization. So
x
(serves as calibration data) is always required whenaccelerator='openvino'
. And there could be no label in calibration data.Please refer to API documentation for more information on
InferenceOptimizer.quantize
.
You could then do the normal inference steps with the model optimized by OpenVINO:
[ ]:
x = tf.random.normal(shape=(100, 40, 40, 3))
# use the optimized model here
y_hat = ov_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)
📚 Related Readings