Accelerate PyTorch Inference using Intel ARC series dGPU#

You can use Nano API to enable the Intel ARC series dGPU acceleration for PyTorch inference. It only takes a few lines.

To apply Intel ARC series dGPU acceleration, there’re several steps for tools installation and environment preparation.

Step 1, please refer to our drive installation for general purpose GPU capabilities.

Step 2, you also need to install OneAPI base tool kit Download the Intel® oneAPI Base Toolkit. OneMKL and DPC++ compiler are needed, others are optional.

Step 3, install proper BigDL-Nano for PyTorch inference using the following commands.

[ ]:

pip install --pre bigdl-nano[pytorch_20_xpu] -f https://developer.intel.com/ipex-whl-stable-xpu # prepare proper nano environment and its dependencies

source bigdl-nano-init --gpu # enable nano environment

📝 Note

Currently Intel ARC series dGPU acceleration for BigDL-Nano is only supported on Linux platform.

Let’s take an ResNet-50 model pretrained on ImageNet dataset as an example. First, we load the model:

[ ]:

from torchvision.models import resnet50

original_model = resnet50(pretrained=True)
original_model.eval()

To enable Intel ARC series dGPU acceleration for your PyTorch inference pipeline, the major change you need to make is to import BigDL-Nano InferenceOptimizer, and trace your PyTorch model to convert it into an PytorchIPEXPUModel for inference by specifying device as “GPU”:

[ ]:

from bigdl.nano.pytorch import InferenceOptimizer

# default not use ipex optimization
acc_model = InferenceOptimizer.trace(original_model, device="GPU", use_ipex=False)

# you can also choose to use ipex optimization
acc_model = InferenceOptimizer.trace(original_model, device="GPU", use_ipex=True)

📝 Note

Please refer to API documentation for more information on InferenceOptimizer.trace.

Currently Intel ARC series dGPU acceleration also supports fp16 precision for your PyTorch inference pipeline. Only a few lines of changes are needed (see below).

[ ]:

from bigdl.nano.pytorch import InferenceOptimizer

# default not use ipex optimization
acc_model = InferenceOptimizer.quantize(original_model, device="GPU", precision="fp16", use_ipex=False)

# you can also choose to use ipex optimization
acc_model = InferenceOptimizer.quantize(original_model, device="GPU", precision="fp16", use_ipex=True)

📝 Note

Please refer to API documentation for more information on InferenceOptimizer.quantize.

You could then do the normal inference steps under the context manager provided by Nano, with the model accelerated by Intel ARC series dGPU:

[ ]:

import torch

with InferenceOptimizer.get_context(acc_model):
    data = torch.rand(2, 3, 224, 224)
    predictions = acc_model(data)

📝 Note

Importantly, ipex (intel-extension-for-pytorch) is needed to be installed no matter ipex optimization is used or not.

And for all Nano optimized models by InferenceOptimizer.trace or InferenceOptimizer.quantize, you need to wrap the inference steps with an automatic context manager InferenceOptimizer.get_context(model=...) provided by Nano. You could refer to here for more detailed usage of the context manager.

📚 Related Readings

How to install BigDL-Nano

How to enable automatic context management for PyTorch inference on Nano optimized models