View the runnable example on GitHub
Accelerate PyTorch Training using Intel® Extension for PyTorch*#
Intel® Extension for PyTorch* (also known as IPEX) can boost performance on Intel hardware with AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs. By using TorchNano
(bigdl.nano.pytorch.TorchNano
), you can make very few code changes to accelerate training loops via IPEX. Here we provide 2 ways to achieve this: A) subclass TorchNano
or B) use
@nano
decorator. You can choose the appropriate one depending on your (preferred) code structure.
📝 Note
Before starting your PyTorch application, it is highly recommended to run
source bigdl-nano-init
to set several environment variables based on your current hardware. Empirically, these variables will greatly improve performance for most PyTorch applications on training workloads.
A) Subclass TorchNano
#
In general, two steps are required if you choose to subclass TorchNano
:
import and subclass
TorchNano
, and override itstrain()
methodinstantiate it with setting
use_ipex=True
, then call thetrain()
method
For step 1, you can refer to this page to achieve it (for consistency, we use the same model and dataset as an example). Supposing that you’ve already got a well-defined subclass MyNano
, below line will instantiate it with enabling IPEX, and call its train()
method.
[ ]:
MyNano(use_ipex=True).train()
The detailed definition of MyNano
can be found in the runnable example.
B) Use @nano
decorator#
@nano
decorator is very friendly since you can only add 2 new lines (import it and wrap the training function) and enjoy the features brought by BigDL-Nano if you have already defined a PyTorch training function with a model, optimizers, and dataloaders as parameters. You can learn the usage and notes of it from here. The only difference when using IPEX is that you should
specify the decorator as @nano(use_ipex=True)
.
[ ]:
from tqdm import tqdm
from bigdl.nano.pytorch import nano # import nano decorator
@nano(use_ipex=True) # apply the decorator to the training loop
def training_loop(model, optimizer, train_loader, num_epochs, loss_func):
for epoch in range(num_epochs):
model.train()
train_loss, num = 0, 0
with tqdm(train_loader, unit="batch") as tepoch:
for data, target in tepoch:
tepoch.set_description(f"Epoch {epoch}")
optimizer.zero_grad()
output = model(data)
loss = loss_func(output, target)
loss.backward()
optimizer.step()
loss_value = loss.sum()
train_loss += loss_value
num += 1
tepoch.set_postfix(loss=loss_value)
print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')
A runnable example including this training_loop
can be seen from here.