# BigDL-Nano Pytorch TorchNano Quickstart

**In this guide we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.**

### Step 0: Prepare Environment

We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.

```bash
conda create py37 python==3.7.10 setuptools==58.0.4
conda activate py37
# nightly bulit version
pip install --pre --upgrade bigdl-nano[pytorch]
# set env variables for your conda environment
source bigdl-nano-init
```

### Step 1: Load the Data

Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.

Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.

```python
from torch.utils.data import DataLoader, Subset

from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import CIFAR10

def create_dataloader(data_path, batch_size):
    train_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.ColorJitter(),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.Resize(128),
        transforms.ToTensor()
    ])

    full_dataset = CIFAR10(root=data_path, train=True,
                           download=True, transform=train_transform)

    # use a subset of full dataset to shorten the training time
    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))

    train_loader = DataLoader(train_dataset, batch_size=batch_size,
                              shuffle=True, num_workers=0)

    return train_loader
```

### Step 2: Define the Model

You may define your model in the same way as the standard PyTorch models.

```python
from torch import nn

from bigdl.nano.pytorch.vision.models import vision

class ResNet18(nn.Module):
    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
        super().__init__()
        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
        output_size = backbone.get_output_size()
        head = nn.Linear(output_size, num_classes)
        self.model = nn.Sequential(backbone, head)

    def forward(self, x):
        return self.model(x)
```

### Step 3: Define Train Loop

Suppose the custom train loop is as follows:

```python
import os
import torch

data_path = os.environ.get("DATA_PATH", ".")
batch_size = 256
max_epochs = 10
lr = 0.01

model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
train_loader = create_dataloader(data_path, batch_size)

model.train()

for _i in range(max_epochs):
    total_loss, num = 0, 0
    for X, y in train_loader:
        optimizer.zero_grad()
        loss = loss_func(model(X), y)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.sum()
        num += 1
    print(f'avg_loss: {total_loss / num}')
```

The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.

We only need the following steps:

- define a class `MyNano` derived from our `TorchNano`
- copy all lines of code into the `train` method of `MyNano`
- add one line to setup model, optimizer and dataloader
- replace the `loss.backward()` with `self.backward(loss)`

```python
import os
import torch

from bigdl.nano.pytorch import TorchNano

class MyNano(TorchNano):
    def train(self):
        # copy all lines of code into this method
        data_path = os.environ.get("DATA_PATH", ".")
        batch_size = 256
        max_epochs = 10
        lr = 0.01

        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
        loss_func = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        train_loader = create_dataloader(data_path, batch_size)

        # add this line to setup model, optimizer and dataloaders
        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)

        model.train()

        for _i in range(max_epochs):
            total_loss, num = 0, 0
            for X, y in train_loader:
                optimizer.zero_grad()
                loss = loss_func(model(X), y)
                self.backward(loss)     # modify this line
                optimizer.step()
                
                total_loss += loss.sum()
                num += 1
            print(f'avg_loss: {total_loss / num}')
```

### Step 4: Run with Nano TorchNano

```python
MyNano().train()
```

At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch application on training workloads.

#### Increase the number of processes in distributed training to accelerate training.

```python
MyNano(num_processes=2, distributed_backend="subprocess").train()
```

- Note: BigDL-Nano now support 'spawn', 'subprocess' and 'ray' backends for distributed training, but only the 'subprocess' backend can be used in interactive environment.

#### Intel Extension for Pytorch (a.k.a. [IPEX](https://github.com/intel/intel-extension-for-pytorch))

IPEX extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `TorchNano`, you can turn on IPEX optimization by setting `use_ipex=True`.

```python
MyNano(use_ipex=True, num_processes=2, distributed_backend="subprocess").train()
```