View the runnable example on GitHub
Accelerate TensorFlow Inference using ONNXRuntime#
You can use InferenceOptimizer.trace(..., accelerator='onnxruntime')
API to enable the ONNXRuntime acceleration for TensorFlow inference. It only takes a few lines.
Let’s take an EfficientNetB0 model pretrained on ImageNet dataset as an example. First, we load the model:
[ ]:
from tensorflow.keras.applications import EfficientNetB0
model = EfficientNetB0(weights='imagenet')
To enable ONNXRuntime acceleration for your TensorFlow inference, the only change you need to made is to import BigDL-Nano InferenceOptimizer
, and trace your TensorFlow model to convert it into an ONNXRuntime accelerated module for inference:
[ ]:
import tensorflow as tf
from bigdl.nano.tf.keras import InferenceOptimizer
ort_model = InferenceOptimizer.trace(model,
accelerator="onnxruntime")
📝 Note
Note that when you have a custom model (e.g. inherited from
tf.keras.Model
), parameterinput_spec
, which should be a (list or tuple of)tf.TensorSpec
, is required for thetrace
function to let ONNXRuntime accelerator know the shape of the model input.Please refer to API documentation for more information on
InferenceOptimizer.trace
.
You could then do the normal inference steps with the model optimized by ONNXRuntime:
[ ]:
x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = ort_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)
📚 Related Readings