BigDL-Nano Pytorch TorchNano Quickstart#

In this guide we’ll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.

Step 0: Prepare Environment#

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create py37 python==3.7.10 setuptools==58.0.4
conda activate py37
# nightly bulit version
pip install --pre --upgrade bigdl-nano[pytorch]
# set env variables for your conda environment
source bigdl-nano-init

Step 1: Load the Data#

Import Cifar10 dataset from torch_vision and modify the train transform. You could access CIFAR10 for a view of the whole dataset.

Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision’s datasets and transforms.

from torch.utils.data import DataLoader, Subset

from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import CIFAR10

def create_dataloader(data_path, batch_size):
    train_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.ColorJitter(),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.Resize(128),
        transforms.ToTensor()
    ])

    full_dataset = CIFAR10(root=data_path, train=True,
                           download=True, transform=train_transform)

    # use a subset of full dataset to shorten the training time
    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))

    train_loader = DataLoader(train_dataset, batch_size=batch_size,
                              shuffle=True, num_workers=0)

    return train_loader

Step 2: Define the Model#

You may define your model in the same way as the standard PyTorch models.

from torch import nn

from bigdl.nano.pytorch.vision.models import vision

class ResNet18(nn.Module):
    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
        super().__init__()
        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
        output_size = backbone.get_output_size()
        head = nn.Linear(output_size, num_classes)
        self.model = nn.Sequential(backbone, head)

    def forward(self, x):
        return self.model(x)

Step 3: Define Train Loop#

Suppose the custom train loop is as follows:

import os
import torch

data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01

model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)

model.train()

for _i in range(max_epochs):
    total_loss, num = 0, 0
    for X, y in train_loader:
        optimizer.zero_grad()
        loss = loss_func(model(X), y)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.sum()
        num += 1
    print(f'avg_loss: {total_loss / num}')

The TorchNano (bigdl.nano.pytorch.TorchNano) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.

We only need the following steps:

define a class MyNano derived from our TorchNano
copy all lines of code into the train method of MyNano
add one line to setup model, optimizer and dataloader
replace the loss.backward() with self.backward(loss)

import os
import torch

from bigdl.nano.pytorch import TorchNano

class MyNano(TorchNano):
    def train(self):
        # copy all lines of code into this method
        data_path = os.environ.get("DATA_PATH", ".")
        batch_size = 256
        max_epochs = 10
        lr = 0.01

        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
        loss_func = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        train_loader = create_dataloader(data_path, batch_size)

        # add this line to setup model, optimizer and dataloaders
        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)

        model.train()

        for _i in range(max_epochs):
            total_loss, num = 0, 0
            for X, y in train_loader:
                optimizer.zero_grad()
                loss = loss_func(model(X), y)
                self.backward(loss)     # modify this line
                optimizer.step()
                
                total_loss += loss.sum()
                num += 1
            print(f'avg_loss: {total_loss / num}')

Step 4: Run with Nano TorchNano#

MyNano().train()

At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch application on training workloads.

Increase the number of processes in distributed training to accelerate training.#

MyNano(num_processes=2, distributed_backend="subprocess").train()

Note: BigDL-Nano now support ‘spawn’, ‘subprocess’ and ‘ray’ backends for distributed training, but only the ‘subprocess’ backend can be used in interactive environment.

Intel Extension for Pytorch (a.k.a. IPEX)#

IPEX extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the TorchNano, you can turn on IPEX optimization by setting use_ipex=True.

MyNano(use_ipex=True, num_processes=2, distributed_backend="subprocess").train()