Quantize PyTorch Model for Inference using OpenVINO Post-training Optimization Tools#

As Post-training Optimization Tools (POT) is provided by OpenVINO toolkit, OpenVINO acceleration will be enabled in the meantime when using POT for quantization. You can call InferenceOptimizer.quantize API with accelerator='openvino' to use POT for your PyTorch nn.Module. It only takes a few lines.

Let’s take an ResNet-18 model pretrained on ImageNet dataset and finetuned on OxfordIIITPet dataset as an example:

[ ]:

from torchvision.models import resnet18

model = resnet18(pretrained=True)
_, train_dataset, val_dataset = finetune_pet_dataset(model)

The full definition of function finetune_pet_dataset could be found in the runnable example.

Then we set it in evaluation mode:

[ ]:

model.eval()

To enable quantization using POT for inference, you could simply import BigDL-Nano InferenceOptimizer, and use InferenceOptimizer to quantize your PyTorch model:

[ ]:

from bigdl.nano.pytorch import InferenceOptimizer

q_model = InferenceOptimizer.quantize(model,
                                      accelerator='openvino',
                                      calib_data=DataLoader(train_dataset, batch_size=32))

📝 Note

For POT, only static post-training quantization is supported. So calib_dataloader (for calibration data) is always required when accelerator='openvino'.

For calib_dataloader, batch size is not important as it intends to read 100 samples. And there could be no label in calibration data.

Please refer to API documentation for more information on InferenceOptimizer.quantize.

You could then do the normal inference steps with the quantized model:

[ ]:

with InferenceOptimizer.get_context(q_model):
    x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
    # use the quantized model here
    y_hat = q_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

📚 Related Readings

How to install BigDL-Nano

How to install BigDL-Nano in Google Colab