Save and Load ONNXRuntime Model#
This example illustrates how to save and load a model accelerated by onnxruntime. In this example, we use a ResNet18 model pretrained. Then, by calling
trace(model, accelerator="onnxruntime"...), we can obtain a model accelarated by onnxruntime method provided by BigDL-Nano for inference. By calling
save(model_name, path) , we could save the model to a folder. By calling
load(path), we could load the model from a folder.
To inference using Bigdl-nano InferenceOptimizer, the following packages need to be installed first. We recommend you to use Miniconda to prepare the environment and install the following packages in a conda environment.
You can create a conda environment by executing:
# "nano" is conda environment name, you can use any name you like. conda create -n nano python=3.7 setuptools=58.0.4 conda activate nano
During your installation, there may be some warnings or errors about version, just ignore them.
# Necessary packages for inference accelaration !pip install --pre --upgrade bigdl-nano[pytorch,inference]
First, prepare model. We use a pretrained ResNet18 model(
model_ft in following code) in this example.
import torch from torchvision.models import resnet18 model_ft = resnet18(pretrained=True) model_ft.eval()
Accelerate Inference Using ONNX Runtime
from bigdl.nano.pytorch import InferenceOptimizer ort_model = InferenceOptimizer.trace(model_ft, accelerator="onnxruntime", input_sample=torch.rand(1, 3, 224, 224)) with InferenceOptimizer.get_context(ort_model): x = torch.rand(2, 3, 224, 224) y_hat = ort_model(x) predictions = y_hat.argmax(dim=1) print(predictions)
Save Optimized Model The saved model files will be saved at “./optimized_model_ort” directory There are 2 files in optimized_model_ort, users only need to take “.onnx” file for further usage:
nano_model_meta.yml: meta information of the saved model checkpoint
onnx_saved_model.onnx: model checkpoint for general use, describes model structure
Load the Optimized Model
loaded_model = InferenceOptimizer.load("./optimized_model_ort")
Inference with the Loaded Model
with InferenceOptimizer.get_context(loaded_model): y_hat = loaded_model(x) predictions = y_hat.argmax(dim=1) print(predictions)