BigDL-Nano TensorFLow Quantization Quickstart#

In this guide we will demonstrates how to apply post-training quantization on a keras model with BigDL-Nano in 4 simple steps.

Step 0: Prepare Environment#

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create py37 python==3.7.10 setuptools==58.0.4
conda activate py37
# nightly bulit version
pip install --pre --upgrade bigdl-nano[tensorflow]
# set env variables for your conda environment
source bigdl-nano-init

By default, Intel Neural Compressor is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you’ll need to install it first:

pip install neural-compressor==1.11.0

BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. The Keras Model(bigdl.nano.tf.keras.Model) and InferenceOptimizer(bigdl.nano.tf.keras.InferenceOptimizer) provides the APIs for all optimizations you need for inference.

from bigdl.nano.tf.keras import Model, InferenceOptimizer

Step 1: Loading Data#

Here we load data from tensorflow_datasets. The Imagenette is a subset of 10 easily classified classes from the Imagenet dataset.

import tensorflow_datasets as tfds
DATANAME = 'imagenette/320px-v2'
(train_ds, test_ds), info = tfds.load(DATANAME, data_dir='../data/',
                                     split=['train', 'validation'],
                                     with_info=True,
                                     as_supervised=True)

Prepare Inputs#

Here we resize the input image to uniform IMG_SIZE and the labels are put into one_hot.

import tensorflow as tf
img_size = 224
num_classes = info.features['label'].num_classes
train_ds = train_ds.map(lambda img, label: (tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes))).batch(32)
test_ds = test_ds.map(lambda img, label: (tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes))).batch(32)

Step 2: Build Model#

Here we initialize the ResNet50 from tf.keras.applications with pre-trained ImageNet weights.

from tensorflow.keras.applications import ResNet50
from tensorflow.keras import layers
inputs = tf.keras.layers.Input(shape=(224, 224, 3))
x = tf.cast(inputs, tf.float32)
x = tf.keras.applications.resnet50.preprocess_input(x)
backbone = ResNet50(weights='imagenet')
backbone.trainable = False
x = backbone(x)
x = layers.Dense(512, activation='relu')(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

# fit
model.fit(train_ds, epochs=1)

Step 3: Quantization with Intel Neural Compressor#

InferenceOptimizer.quantize() return a Keras module with desired precision and accuracy. Taking Resnet50 as an example, you can add quantization as below.

from tensorflow.keras.metrics import CategoricalAccuracy
q_model = InferenceOptimizer.quantize(model,
                                      calib_dataset=dataset,
                                      metric=CategoricalAccuracy(),
                                      tuning_strategy='basic'
                                      )

The quantized model can be called to do inference as normal keras model.

# run simple prediction with transparent acceleration
for img, _ in dataset:
    q_model(img)