View the runnable example on GitHub
Accelerate PyTorch Inference using ONNXRuntime#
You can use InferenceOptimizer.trace(..., accelerator='onnxruntime')
API to enable the ONNXRuntime acceleration for PyTorch inference. It only takes a few lines.
Let’s take an ResNet-18 model pretrained on ImageNet dataset as an example. First, we load the model:
[ ]:
from torchvision.models import resnet18
model_ft = resnet18(pretrained=True)
To enable ONNXRuntime acceleration for your PyTorch inference pipeline, the major change you need to make is to import BigDL-Nano InferenceOptimizer
, and trace your PyTorch model to convert it into an ONNXRuntime accelerated model for inference:
[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer
ort_model = InferenceOptimizer.trace(model_ft,
accelerator="onnxruntime",
input_sample=torch.rand(1, 3, 224, 224))
📝 Note
input_sample
is the parameter for ONNXRuntime accelerator to know the shape of the model input. So both the batch size and the specific values are not important toinput_sample
. If we want our test dataset to consist of images with \(224 \times 224\) pixels, we could usetorch.rand(1, 3, 224, 224)
forinput_sample
here.Please refer to API documentation for more information on
InferenceOptimizer.trace
.
You could then do the normal inference steps under the context manager provided by Nano, with the model optimized by ONNXRuntime:
[ ]:
with InferenceOptimizer.get_context(ort_model):
x = torch.rand(2, 3, 224, 224)
# use the optimized model here
y_hat = ort_model(x)
predictions = y_hat.argmax(dim=1)
print(predictions)
📝 Note
For all Nano optimized models by
InferenceOptimizer.trace
, you need to wrap the inference steps with an automatic context managerInferenceOptimizer.get_context(model=...)
provided by Nano. You could refer to here for more detailed usage of the context manager.