View the runnable example on GitHub
Use @nano
Decorator to Accelerate PyTorch Training Loop#
BigDL-Nano integrates multiple optimizations to accelerate PyTorch training workloads. As a pure PyTorch user, you could simply wrap your custom PyTorch training loop with @nano
decorator to benefit from BigDL-Nano.
📝 Note
Before starting your PyTorch application, it is highly recommended to run
source bigdl-nano-init
to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most PyTorch applications on training workloads.
Suppose you define your custom PyTorch training loop as follows. To benefit from BigDL-Nano integrated optimizations, you could simply import nano
decorator, and wrap the training loop with it.
[ ]:
from tqdm import tqdm
from bigdl.nano.pytorch import nano # import nano decorator
@nano() # apply the decorator to the training loop
def training_loop(model, optimizer, train_loader, num_epochs, loss_func):
for epoch in range(num_epochs):
model.train()
train_loss, num = 0, 0
with tqdm(train_loader, unit="batch") as tepoch:
for data, target in tepoch:
tepoch.set_description(f"Epoch {epoch}")
optimizer.zero_grad()
output = model(data)
loss = loss_func(output, target)
loss.backward()
optimizer.step()
loss_value = loss.sum()
train_loss += loss_value
num += 1
tepoch.set_postfix(loss=loss_value)
print(f'Train Epoch: {epoch}, avg_loss: {train_loss / num}')
📝 Note
To make sure
@nano
is functional on your custom training loop, there are some requirements for its parameter lists:
there should be one and only one instance of
torch.nn.Module
passed in the training loop as modelthere should be at least one instance of
torch.optim.Optimizer
passed in the training loop as optimizerthere should be at least one instance of
torch.utils.data.DataLoader
passed in the training loop as dataloader
You could then call the training_loop
method as normal:
[ ]:
model = MyPytorchModule()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
loss_func = torch.nn.CrossEntropyLoss()
train_loader = create_train_dataloader()
training_loop(model, optimizer, train_loader, num_epochs=5, loss_func=loss_func)
The definition of MyPytorchModule
and create_train_dataloader
can be found in the runnable example.
📝 Note
Due to the optimized environment variables set by
source bigdl-nano-init
, you could already experience some training acceleration after wrapping your custom training loop with@nano
decorator.For more optimizations provided by
@nano
decorator, you can refer to the Related Readings.
📚 Related Readings
How to accelerate a PyTorch application on training workloads through Intel® Extension for PyTorch*
How to accelerate a PyTorch application on training workloads through multiple instances
How to use the channels last memory format in your PyTorch application for training
How to conduct BFloat16 Mixed Precision training in your PyTorch application