View the runnable example on GitHub
Quantize PyTorch Model for Inference using OpenVINO Post-training Optimization Tools#
As Post-training Optimization Tools (POT) is provided by OpenVINO toolkit, OpenVINO acceleration will be enabled in the meantime when using POT for quantization. You can call InferenceOptimizer.quantize
API with accelerator='openvino'
to use POT for your PyTorch nn.Module
. It only takes a few lines.
Let’s take an ResNet-18 model pretrained on ImageNet dataset and finetuned on OxfordIIITPet dataset as an example:
[ ]:
from torchvision.models import resnet18
model = resnet18(pretrained=True)
_, train_dataset, val_dataset = finetune_pet_dataset(model)
The full definition of function finetune_pet_dataset
could be found in the runnable example.
Then we set it in evaluation mode:
[ ]:
model.eval()
To enable quantization using POT for inference, you could simply import BigDL-Nano InferenceOptimizer
, and use InferenceOptimizer
to quantize your PyTorch model:
[ ]:
from bigdl.nano.pytorch import InferenceOptimizer
q_model = InferenceOptimizer.quantize(model,
accelerator='openvino',
calib_data=DataLoader(train_dataset, batch_size=32))
📝 Note
For POT, only static post-training quantization is supported. So
calib_dataloader
(for calibration data) is always required whenaccelerator='openvino'
.For
calib_dataloader
, batch size is not important as it intends to read 100 samples. And there could be no label in calibration data.Please refer to API documentation for more information on
InferenceOptimizer.quantize
.
You could then do the normal inference steps with the quantized model:
[ ]:
with InferenceOptimizer.get_context(q_model):
x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
# use the quantized model here
y_hat = q_model(x)
predictions = y_hat.argmax(dim=1)
print(predictions)
📚 Related Readings