View the runnable example on GitHub
Accelerate Computer Vision Data Processing Pipeline#
You can use transforms
and datasets
from bigdl.nano.pytorch.vision
, which takes advantage of OpenCV and libjpeg-turbo, as a drop-in replacement of torchvision.transforms
and torchvision.datasets
to easily accelerate your computer vision data processing pipeline in PyTorch or PyTorch Lightning applications.
Suppose that you would like to preprocess OxfordIIITPet dataset. To accelerate this preocess, you could simply import BigDL-Nano transforms
and datasets
to replace torchvision.transforms
and torchvision.datasets
:
[ ]:
# from torchvision import transforms
# from torchvision.datasets import OxfordIIITPet
from bigdl.nano.pytorch.vision import transforms
from bigdl.nano.pytorch.vision.datasets import OxfordIIITPet
# Data processing steps are the same as using torchvision
train_transform = transforms.Compose([transforms.Resize(256),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=.5, hue=.3),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
val_transform = transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
train_dataset = OxfordIIITPet(root="/tmp/data", transform=train_transform, download=True)
val_dataset = OxfordIIITPet(root="/tmp/data", transform=val_transform)
You could then create train/validate dataloader as normal:
[ ]:
# obtain training indices that will be used for validation
import torch
indices = torch.randperm(len(train_dataset))
val_size = len(train_dataset) // 4
train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
# create dataloaders
from torch.utils.data.dataloader import DataLoader
train_dataloader = DataLoader(train_dataset, batch_size=32)
val_dataloader = DataLoader(val_dataset, batch_size=32)
📚 Related Readings