Accelerate TensorFlow Inference using ONNXRuntime#
You can use
InferenceOptimizer.trace(..., accelerator='onnxruntime') API to enable the ONNXRuntime acceleration for TensorFlow inference. It only takes a few lines.
Let’s take an EfficientNetB0 model pretrained on ImageNet dataset as an example. First, we load the model:
from tensorflow.keras.applications import EfficientNetB0 model = EfficientNetB0(weights='imagenet')
To enable ONNXRuntime acceleration for your TensorFlow inference, the only change you need to made is to import BigDL-Nano
InferenceOptimizer, and trace your TensorFlow model to convert it into an ONNXRuntime accelerated module for inference:
import tensorflow as tf from bigdl.nano.tf.keras import InferenceOptimizer ort_model = InferenceOptimizer.trace(model, accelerator="onnxruntime")
Note that when you have a custom model (e.g. inherited from
input_spec, which should be a (list or tuple of)
tf.TensorSpec, is required for the
tracefunction to let ONNXRuntime accelerator know the shape of the model input.
Please refer to API documentation for more information on
You could then do the normal inference steps with the model optimized by ONNXRuntime:
x = tf.random.normal(shape=(2, 224, 224, 3)) # use the optimized model here y_hat = ort_model(x) predictions = tf.argmax(y_hat, axis=1) print(predictions)