View the runnable example on GitHub

Accelerate PyTorch Inference using ONNXRuntime#

You can use InferenceOptimizer.trace(..., accelerator='onnxruntime') API to enable the ONNXRuntime acceleration for PyTorch inference. It only takes a few lines.

Let’s take an ResNet-18 model pretrained on ImageNet dataset as an example. First, we load the model:

[ ]:
from torchvision.models import resnet18

model_ft = resnet18(pretrained=True)

To enable ONNXRuntime acceleration for your PyTorch inference pipeline, the major change you need to make is to import BigDL-Nano InferenceOptimizer, and trace your PyTorch model to convert it into an ONNXRuntime accelerated model for inference:

[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer

ort_model = InferenceOptimizer.trace(model_ft,
                                     accelerator="onnxruntime",
                                     input_sample=torch.rand(1, 3, 224, 224))

📝 Note

input_sample is the parameter for ONNXRuntime accelerator to know the shape of the model input. So both the batch size and the specific values are not important to input_sample. If we want our test dataset to consist of images with \(224 \times 224\) pixels, we could use torch.rand(1, 3, 224, 224) for input_sample here.

Please refer to API documentation for more information on InferenceOptimizer.trace.

You could then do the normal inference steps under the context manager provided by Nano, with the model optimized by ONNXRuntime:

[ ]:
with InferenceOptimizer.get_context(ort_model):
    x = torch.rand(2, 3, 224, 224)
    # use the optimized model here
    y_hat = ort_model(x)
    predictions = y_hat.argmax(dim=1)
    print(predictions)

📝 Note

For all Nano optimized models by InferenceOptimizer.trace, you need to wrap the inference steps with an automatic context manager InferenceOptimizer.get_context(model=...) provided by Nano. You could refer to here for more detailed usage of the context manager.