View the runnable example on GitHub
Save and Load ONNXRuntime Model in TensorFlow#
This example illustrates how to save and load a TensorFlow Keras model accelerated by onnxruntime. In this example, we use a pretrained EfficientNetB0 model. Then, by calling trace(model, accelerator="onnxruntime"...)
, we can obtain a model accelarated by onnxruntime method provided by BigDL-Nano for inference. By calling save(model_name, path)
, we could save the model to a folder. By calling load(path, model_name)
, we could load the model from a folder.
First, prepare model. We use an EfficientNetB0 model (model_ft
in following code) pretrained on Imagenet dataset in this example.
[ ]:
from tensorflow.keras.applications import EfficientNetB0
model_ft = EfficientNetB0(weights='imagenet')
Accelerate Inference Using ONNX Runtime
[ ]:
import tensorflow as tf
from bigdl.nano.tf.keras import InferenceOptimizer
ort_model = InferenceOptimizer.trace(model_ft,
accelerator="onnxruntime",
input_spec=tf.TensorSpec(shape=(None, 224, 224, 3))
)
x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = ort_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)
Save Optimized Model. The saved model files will be saved at “./optimized_model_ort” directory. There are 2 major files in optimized_model_ort, users only need to take “.onnx” file for further usage:
nano_model_meta.yml: meta information of the saved model checkpoint
onnx_saved_model.onnx: model checkpoint for general use, describes model structure
[ ]:
InferenceOptimizer.save(ort_model, "./optimized_model_ort")
Load the Optimized Model
[ ]:
loaded_model = InferenceOptimizer.load("./optimized_model_ort", model_ft)
Inference with the Loaded Model
[ ]:
# use the optimized model here
y_hat_ld = loaded_model(x)
predictions_ld = tf.argmax(y_hat_ld, axis=1)
print(predictions_ld)
📚 Related Readings