View the runnable example on GitHub
Accelerate PyTorch Lightning Training using Multiple Instances¶
bigdl.nano.pytorch.Trainer
API extends PyTorch Lightning Trainer with multiple integrated optimizations. You can instantiate a BigDL-Nano Trainer
with specified num_processes
to benefit from multi-instance training on a server with multiple CPU cores or sockets, so that the workload can make full use of all CPU cores.
📝 Note
Before starting your PyTorch Lightning application, it is highly recommended to run
source bigdl-nano-init
to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most PyTorch Lightning applications on training workloads.
Let’s take a self-defined LightningModule
(based on a ResNet-18 model pretrained on ImageNet dataset) and dataloaders to finetune the model on OxfordIIITPet dataset as an example:
[ ]:
model = MyLightningModule()
train_loader, val_loader = create_dataloaders()
The definition of MyLightningModule
and create_dataloaders
can be found in the runnable example.
To enable multi-instance training, you could simply import BigDL-Nano Trainer, and set num_processes
to an integer larger than 1:
[ ]:
from bigdl.nano.pytorch import Trainer
trainer = Trainer(max_epochs=5, num_processes=2)
📝 Note
By setting
num_processes
, Nano will launch the specific number of processes to perform data-parallel training. By default, CPU cores will be automatically and evenly distributed among processes to avoid conflicts and maximize training throughput. If you would like to specifiy the CPU cores used by each process, You could setcpu_for_each_process
to a list of lengthnum_processes
, in which each item is a list of CPU indices.During multi-instance training, the effective batch size is the
batch_size
(in dataloader) \(\times\)num_processes
, which will cause the number of iterations in each epoch to reduce by a factor ofnum_processes
. A common practice to compensate for this is to gradually increase the learning rate tonum_processes
times. You could find more details of this trick in this paper published by Facebook.
You could then do the multi-instance training and evaluation as normal:
[ ]:
trainer.fit(model, train_dataloaders=train_loader)
trainer.validate(model, dataloaders=val_loader)
📚 Related Readings