BigDL-Nano PyTorch Training Overview¶
BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on training workloads. The optimizations in BigDL-Nano are delivered through an extended version of PyTorch-Lightning
Trainer. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method.
We will briefly describe here the major features in BigDL-Nano for PyTorch training. You can find complete examples here.
Best Known Configurations¶
When you run
source bigdl-nano-init, BigDL-Nano will export a few environment variables, such as
KMP_AFFINITY, according to your current hardware. Empirically, these environment variables work best for most PyTorch applications. After setting these environment variables, you can just run your applications as usual (
python app.py) and no additional changes are required.
BigDL-Nano PyTorch Trainer¶
The PyTorch Trainer (
bigdl.nano.pytorch.Trainer) is the place where we integrate most optimizations. It extends PyTorch Lightning’s Trainer and has a few more parameters and methods specific to BigDL-Nano. The Trainer can be directly used to train a
from pytorch_lightning import LightningModule from bigdl.nano.pytorch import Trainer class MyModule(LightningModule): # LightningModule definition from bigdl.nano.pytorch import Trainer lightning_module = MyModule() trainer = Trainer(max_epoch=10) trainer.fit(lightning_module, train_loader)
For regular PyTorch modules, we also provide a “compile” method, that takes in a PyTorch module, an optimizer, and other PyTorch objects and “compiles” them into a
from bigdl.nano.pytorch import Trainer lightning_module = Trainer.compile(pytorch_module, loss, optimizer, scheduler) trainer = Trainer(max_epoch=10) trainer.fit(lightning_module, train_loader)
Intel® Extension for PyTorch¶
Intel Extension for PyTorch (a.k.a. IPEX) extends PyTorch with optimizations for an extra performance boost on Intel hardware. BigDL-Nano integrates IPEX through the
Trainer. Users can turn on IPEX by setting
from bigdl.nano.pytorch import Trainer trainer = Trainer(max_epoch=10, use_ipex=True)
When training on a server with dozens of CPU cores, it is often beneficial to use multiple training instances in a data-parallel fashion to make full use of the CPU cores. However, using PyTorch’s DDP API is a little cumbersome and error-prone, and if not configured correctly, it will make the training even slow.
BigDL-Nano makes it very easy to conduct multi-instance training. You can just set the
num_processes parameter in the
Trainer constructor and BigDL-Nano will launch the specific number of processes to perform data-parallel training. Each process will be automatically pinned to a different subset of CPU cores to avoid conflict and maximize training throughput.
from bigdl.nano.pytorch import Trainer trainer = Trainer(max_epoch=10, num_processes=4)
Note that the effective batch size in multi-instance training is the
batch_size in your
num_processes so the number of iterations of each epoch will be reduced
num_processes fold. A common practice to compensate for that is to gradually increase the learning rate to
num_processes times. You can find more details of this trick in this paper published by Facebook.
BigDL-Nano PyTorch TorchNano¶
bigdl.nano.pytorch.TorchNano) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop. For example,
from bigdl.nano.pytorch import TorchNano class MyNano(TorchNano) : def train(self, ...): # copy your train loop here and make a few changes MyNano().train(...)
note: see this tutorial for details about our
TorchNano also integrates IPEX and distributed training optimizations. For example,
from bigdl.nano.pytorch import TorchNano class MyNano(TorchNano): def train(self, ...): # define train loop # enable IPEX optimizaiton MyNano(use_ipex=True).train(...) # enable IPEX and distributed training, using subprocess strategy MyNano(use_ipex=True, num_processes=2, strategy="subprocess").train(...)
Optimized Data Pipeline¶
Computer Vision task often needs a data processing pipeline that sometimes constitutes a non-trivial part of the whole training pipeline. Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision’s
from bigdl.nano.pytorch.vision.datasets import ImageFolder from bigdl.nano.pytorch.vision import transforms data_transform = transforms.Compose([ transforms.Resize(256), transforms.ColorJitter(), transforms.RandomCrop(224), transforms.RandomHorizontalFlip(), transforms.Resize(128), transforms.ToTensor() ]) train_set = ImageFolder(train_path, data_transform) train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True) trainer.fit(module, train_loader)