View the runnable example on GitHub

Accelerate PyTorch Inference using Multiple Instances#

You can use InferenceOptimizer.to_multi_instance(model, num_processes=n) API to enable multi-instance acceleration for PyTorch inference. It only takes a few lines.

Let’s take a ResNet-18 model pretrained on ImageNet dataset as an example. First, we load the model:

[ ]:
from torchvision.models import resnet18

model_ft = resnet18(pretrained=True)

Then we set it in evaluation mode:

[ ]:
model_ft.eval()

To enable multi-instance acceleration for your PyTorch inference pipeline, you should import BigDL-Nano InferenceOptimizer, and convert your model to a multi-instance model:

[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer

multi_model = InferenceOptimizer.to_multi_instance(model_ft, num_processes=2)

📝 Note

num_processes is used to specify the number of processes to use. After calling InferenceOptimizer.to_multi_instance, multi_model will receive a DataLoader or a list of batches instead of a batch, and produce a list of inference result instead of a single result.

Please refer to API documentation for more information about InferenceOptimizer.to_multi_instance.

You could use multi_model as following:

[ ]:
# inference a list of batches, the shape of a batch is (2, 3, 224, 224)
batch_list = [torch.rand(2, 3, 224, 224) for _i in range(16)]
y_hat_list = multi_model(batch_list)

# inference a DataLoader
from torch.utils.data import TensorDataset, DataLoader
imgs = torch.rand(32, 3, 224, 224)
dataset = TensorDataset(imgs)
# dataloader is a DataLoader, its length is 16 and the shape of its batch is (2, 3, 224, 224)
dataloader = DataLoader(dataset=dataset, batch_size=2, collate_fn=lambda img_list: torch.stack([img[0] for img in img_list]))
y_hat_list = multi_model(dataloader)

# y_hat_list is a list of inference result, you can use it like this
for y_hat in y_hat_list:
    predictions = y_hat.argmax(dim=1)
    print(predictions)

You can use cores_per_process parameter to specify the number of CPU cores used by each process, or use cpu_for_each_process parameter to specify the CPU cores used by each process. Normally you don’t need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:

[ ]:
# Use 2 processes to run inference,
# each process will use 1 CPU cores
multi_model = InferenceOptimizer.to_multi_instance(model_ft, num_processes=2, cores_per_process=1)
y_hat_list = multi_model(batch_list)

# Use 2 processes to run inference,
# the first process will use core 0, the second process will use core 1
multi_model = InferenceOptimizer.to_multi_instance(model_ft, cpu_for_each_process=[[0], [1]])
y_hat_list = multi_model(batch_list)

📝 Note

Setting cpu_for_each_process will override num_processes and cores_per_process.

📚 Related Readings