View the runnable example on GitHub
Accelerate Computer Vision Data Processing Pipeline¶
You can use transforms
and datasets
from bigdl.nano.pytorch.vision
as a drop-in replacement of torchvision.transforms
and torchvision.datasets
to easily accelerate your computer vision data processing pipeline in PyTorch Lightning applications.
📝 Note
Before starting your PyTorch Lightning application, it is highly recommended to run
source bigdl-nano-init
to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most PyTorch Lightning applications on training workloads.
Let’s take a self-defined LightningModule
(based on a ResNet-18 model pretrained on ImageNet dataset) as an example, and suppose we would like to finetune the model on OxfordIIITPet dataset:
[ ]:
model = MyLightningModule()
The definition of MyLightningModule
can be found in the runnable example.
To finetune the model on OxfordIIITPet dataset, we need to create required train/validate datasets and dataloaders. To accelerate the data processing pipeline, you could simply import BigDL-Nano transforms
and datasets
to replace torchvision.transforms
and torchvision.datasets
:
[ ]:
# from torchvision import transforms
# from torchvision.datasets import OxfordIIITPet
from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import OxfordIIITPet
# Data processing steps are the same as using torchvision
train_transform = transforms.Compose([transforms.Resize(256),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=.5, hue=.3),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
val_transform = transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
train_dataset = OxfordIIITPet(root="/tmp/data", transform=train_transform, download=True)
val_dataset = OxfordIIITPet(root="/tmp/data", transform=val_transform)
[ ]:
train_loader, val_loader = create_dataloaders(train_dataset, val_dataset)
The definition of create_dataloaders
can be found in the runnable example.
You could then do the training and evaluation steps with Nano Trainer
:
[ ]:
from bigdl.nano.pytorch import Trainer
trainer = Trainer(max_epochs=5)
trainer.fit(model, train_dataloaders=train_loader)
trainer.validate(model, dataloaders=val_loader)
📚 Related Readings