BigDL-Nano Pytorch TorchNano Quickstart#
In this guide we’ll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.
Step 0: Prepare Environment#
We recommend using conda to prepare the environment. Please refer to the install guide for more details.
conda create py37 python==3.7.10 setuptools==58.0.4
conda activate py37
# nightly bulit version
pip install --pre --upgrade bigdl-nano[pytorch]
# set env variables for your conda environment
source bigdl-nano-init
Step 1: Load the Data#
Import Cifar10 dataset from torch_vision and modify the train transform. You could access CIFAR10 for a view of the whole dataset.
Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision’s datasets
and transforms
.
from torch.utils.data import DataLoader, Subset
from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import CIFAR10
def create_dataloader(data_path, batch_size):
train_transform = transforms.Compose([
transforms.Resize(256),
transforms.ColorJitter(),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.Resize(128),
transforms.ToTensor()
])
full_dataset = CIFAR10(root=data_path, train=True,
download=True, transform=train_transform)
# use a subset of full dataset to shorten the training time
train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))
train_loader = DataLoader(train_dataset, batch_size=batch_size,
shuffle=True, num_workers=0)
return train_loader
Step 2: Define the Model#
You may define your model in the same way as the standard PyTorch models.
from torch import nn
from bigdl.nano.pytorch.vision.models import vision
class ResNet18(nn.Module):
def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
super().__init__()
backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
output_size = backbone.get_output_size()
head = nn.Linear(output_size, num_classes)
self.model = nn.Sequential(backbone, head)
def forward(self, x):
return self.model(x)
Step 3: Define Train Loop#
Suppose the custom train loop is as follows:
import os
import torch
data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01
model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)
model.train()
for _i in range(max_epochs):
total_loss, num = 0, 0
for X, y in train_loader:
optimizer.zero_grad()
loss = loss_func(model(X), y)
loss.backward()
optimizer.step()
total_loss += loss.sum()
num += 1
print(f'avg_loss: {total_loss / num}')
The TorchNano
(bigdl.nano.pytorch.TorchNano
) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.
We only need the following steps:
define a class
MyNano
derived from ourTorchNano
copy all lines of code into the
train
method ofMyNano
add one line to setup model, optimizer and dataloader
replace the
loss.backward()
withself.backward(loss)
import os
import torch
from bigdl.nano.pytorch import TorchNano
class MyNano(TorchNano):
def train(self):
# copy all lines of code into this method
data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01
model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)
# add this line to setup model, optimizer and dataloaders
model, optimizer, train_loader = self.setup(model, optimizer, train_loader)
model.train()
for _i in range(max_epochs):
total_loss, num = 0, 0
for X, y in train_loader:
optimizer.zero_grad()
loss = loss_func(model(X), y)
self.backward(loss) # modify this line
optimizer.step()
total_loss += loss.sum()
num += 1
print(f'avg_loss: {total_loss / num}')
Step 4: Run with Nano TorchNano#
MyNano().train()
At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch application on training workloads.
Increase the number of processes in distributed training to accelerate training.#
MyNano(num_processes=2, distributed_backend="subprocess").train()
Note: BigDL-Nano now support ‘spawn’, ‘subprocess’ and ‘ray’ backends for distributed training, but only the ‘subprocess’ backend can be used in interactive environment.
Intel Extension for Pytorch (a.k.a. IPEX)#
IPEX extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the TorchNano
, you can turn on IPEX optimization by setting use_ipex=True
.
MyNano(use_ipex=True, num_processes=2, distributed_backend="subprocess").train()