Accelerate PyTorch Inference using ONNXRuntime¶

You can use InferenceOptimizer.trace(..., accelerator='onnxruntime') API to enable the ONNXRuntime acceleration for PyTorch inference. It only takes a few lines.

Let’s take an ResNet-18 model pretrained on ImageNet dataset as an example. First, we load the model:

[ ]:

from torchvision.models import resnet18

model_ft = resnet18(pretrained=True)

Then we set it in evaluation mode:

[ ]:

model_ft.eval()

To enable ONNXRuntime acceleration for your PyTorch inference pipeline, the only change you need to made is to import BigDL-Nano InferenceOptimizer, and trace your PyTorch model to convert it into an ONNXRuntime accelerated module for inference:

[ ]:

import torch
from bigdl.nano.pytorch import InferenceOptimizer

ort_model = InferenceOptimizer.trace(model_ft,
                                     accelerator="onnxruntime",
                                     input_sample=torch.rand(1, 3, 224, 224))

📝 Note

input_sample is the parameter for ONNXRuntime accelerator to know the shape of the model input. So both the batch size and the specific values are not important to input_sample. If we want our test dataset to consist of images with \(224 \times 224\) pixels, we could use torch.rand(1, 3, 224, 224) for input_sample here.

Please refer to API documentation for more information on InferenceOptimizer.trace.

You could then do the normal inference steps with the model optimized by ONNXRuntime:

[ ]:

x = torch.rand(2, 3, 224, 224)
# use the optimized model here
y_hat = ort_model(x)
predictions = y_hat.argmax(dim=1)
print(predictions)

📚 Related Readings

How to install BigDL-Nano

How to install BigDL-Nano in Google Colab