View the runnable example on GitHub
Accelerate PyTorch Inference using Multiple Instances#
You can use InferenceOptimizer.to_multi_instance(model, num_processes=n)
API to enable multi-instance acceleration for PyTorch inference. It only takes a few lines.
Let’s take a ResNet-18 model pretrained on ImageNet dataset as an example. First, we load the model:
[ ]:
from torchvision.models import resnet18
model_ft = resnet18(pretrained=True)
Then we set it in evaluation mode:
[ ]:
model_ft.eval()
To enable multi-instance acceleration for your PyTorch inference pipeline, you should import BigDL-Nano InferenceOptimizer
, and convert your model to a multi-instance model:
[ ]:
import torch
from bigdl.nano.pytorch import InferenceOptimizer
multi_model = InferenceOptimizer.to_multi_instance(model_ft, num_processes=2)
📝 Note
num_processes
is used to specify the number of processes to use. After callingInferenceOptimizer.to_multi_instance
,multi_model
will receive aDataLoader
or a list of batches instead of a batch, and produce a list of inference result instead of a single result.Please refer to API documentation for more information about
InferenceOptimizer.to_multi_instance
.
You could use multi_model
as following:
[ ]:
# inference a list of batches, the shape of a batch is (2, 3, 224, 224)
batch_list = [torch.rand(2, 3, 224, 224) for _i in range(16)]
y_hat_list = multi_model(batch_list)
# inference a DataLoader
from torch.utils.data import TensorDataset, DataLoader
imgs = torch.rand(32, 3, 224, 224)
dataset = TensorDataset(imgs)
# dataloader is a DataLoader, its length is 16 and the shape of its batch is (2, 3, 224, 224)
dataloader = DataLoader(dataset=dataset, batch_size=2, collate_fn=lambda img_list: torch.stack([img[0] for img in img_list]))
y_hat_list = multi_model(dataloader)
# y_hat_list is a list of inference result, you can use it like this
for y_hat in y_hat_list:
predictions = y_hat.argmax(dim=1)
print(predictions)
You can use cores_per_process
parameter to specify the number of CPU cores used by each process, or use cpu_for_each_process
parameter to specify the CPU cores used by each process. Normally you don’t need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
[ ]:
# Use 2 processes to run inference,
# each process will use 1 CPU cores
multi_model = InferenceOptimizer.to_multi_instance(model_ft, num_processes=2, cores_per_process=1)
y_hat_list = multi_model(batch_list)
# Use 2 processes to run inference,
# the first process will use core 0, the second process will use core 1
multi_model = InferenceOptimizer.to_multi_instance(model_ft, cpu_for_each_process=[[0], [1]])
y_hat_list = multi_model(batch_list)
📝 Note
Setting
cpu_for_each_process
will overridenum_processes
andcores_per_process
.
📚 Related Readings