View the runnable example on GitHub

Save and Load ONNXRuntime Model in TensorFlow#

This example illustrates how to save and load a TensorFlow Keras model accelerated by onnxruntime. In this example, we use a pretrained EfficientNetB0 model. Then, by calling trace(model, accelerator="onnxruntime"...), we can obtain a model accelarated by onnxruntime method provided by BigDL-Nano for inference. By calling save(model_name, path) , we could save the model to a folder. By calling load(path, model_name), we could load the model from a folder.

First, prepare model. We use an EfficientNetB0 model (model_ft in following code) pretrained on Imagenet dataset in this example.

[ ]:
from tensorflow.keras.applications import EfficientNetB0

model_ft = EfficientNetB0(weights='imagenet')

Accelerate Inference Using ONNX Runtime

[ ]:
import tensorflow as tf
from bigdl.nano.tf.keras import InferenceOptimizer

ort_model = InferenceOptimizer.trace(model_ft,
                                     accelerator="onnxruntime",
                                     input_spec=tf.TensorSpec(shape=(None, 224, 224, 3))
                                     )

x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = ort_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)

Save Optimized Model. The saved model files will be saved at “./optimized_model_ort” directory. There are 2 major files in optimized_model_ort, users only need to take “.onnx” file for further usage:

  • nano_model_meta.yml: meta information of the saved model checkpoint

  • onnx_saved_model.onnx: model checkpoint for general use, describes model structure

[ ]:
InferenceOptimizer.save(ort_model, "./optimized_model_ort")

Load the Optimized Model

[ ]:
loaded_model = InferenceOptimizer.load("./optimized_model_ort", model_ft)

Inference with the Loaded Model

[ ]:
# use the optimized model here
y_hat_ld = loaded_model(x)
predictions_ld = tf.argmax(y_hat_ld, axis=1)
print(predictions_ld)