Nano Tensorflow API#

bigdl.nano.tf.keras#

class bigdl.nano.tf.keras.Model(*args: Any, **kwargs: Any)#

A wrapper class for tf.keras.Model adding more functions for BigDL-Nano.

fit(x=None, y=None, batch_size=None, epochs=1, verbose='auto', callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False, num_processes=None, backend='multiprocessing')#

Override tf.keras.Model.fit to add more parameters.

All arguments that already exists in tf.keras.Model.fit has the same sementics with tf.keras.Model.fit.

Additional parameters:

Parameters
  • num_processes – when num_processes is not None, it specifies how many sub-processes to launch to run pseudo-distributed training; when num_processes is None, training will run in the current process.

  • backend – when num_processes is not None, it specifies which backend to use when launching sub-processes to run psedu-distributed training; when num_processes is None, this parameter takes no effect.

quantize(x: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.data.ops.dataset_ops.DatasetV1], y: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray] = None, precision: str = 'int8', accelerator: Optional[str] = None, input_spec=None, metric: Optional[tensorflow.python.keras.metrics.Metric] = None, accuracy_criterion: Optional[dict] = None, approach: str = 'static', method: Optional[str] = None, conf: Optional[str] = None, tuning_strategy: Optional[str] = None, timeout: Optional[int] = None, max_trials: Optional[int] = None, batch: Optional[int] = None, thread_num: Optional[int] = None, inputs: List[str] = None, outputs: List[str] = None, sample_size: int = 100, onnxruntime_session_options=None, openvino_config=None, logging: bool = True)#

Post-training quantization on a keras model.

Parameters
  • x

    Input data which is used for training. It could be:

    1. a Numpy array (or array-like), or a list of arrays (in case the model
    has multiple inputs).

    2. a TensorFlow tensor, or a list of tensors (in case the model has
    multiple inputs).

    3. an unbatched tf.data.Dataset. Should return a tuple of (inputs, targets).

    X will be used as calibration dataset for Post-Training Static Quantization (PTQ), as well as be used for generating input_sample to calculate latency. To avoid data leak during calibration, please use training dataset.

  • y – Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). Its length should be consistent with x. If x is a dataset, y will be ignored (since targets will be obtained from x).

  • precision – Global precision of quantized model, supported type: ‘int8’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in tensorflow.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. If accelerator='onnxruntime', input_spec is required. If accelerator='openvino', or accelerator=None and precision='int8', input_spec is required when you have a custom Keras model.

  • metric – A tensorflow.keras.metrics.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop. accuracy_criterion = {‘relative’: 0.1, ‘higher_is_better’: True} allows relative accuracy loss: 1%. accuracy_criterion = {‘absolute’: 0.99, ‘higher_is_better’:False} means accuracy must be smaller than 0.99.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. Only ‘static’ approach is supported now.

  • method – Method to do quantization. When accelerator=None, supported methods: None. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • batch – Batch size of dataloader for calib_dataset. Defaults to None, if the dataset is not a BatchDataset, batchsize equals to 1. Otherwise, batchsize complies with the dataset._batch_size.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • inputs – A list of input names. Default: None, automatically get names from graph.

  • outputs – A list of output names. Default: None, automatically get names from graph.

  • sample_size – (optional) a int represents how many samples will be used for Post-training Optimization Tools (POT) from OpenVINO toolkit, only valid for accelerator=’openvino’. Default to 100. The larger the value, the more accurate the conversion, the lower the performance degradation, but the longer the time.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

Returns

A TensorflowBaseModel for INC. If there is no model found, return None.

Warning

This function will be deprecated in future release.

Please use bigdl.nano.tf.keras.InferenceOptimizer.quantize instead.

trace(accelerator: Optional[str] = None, input_spec=None, thread_num: Optional[int] = None, onnxruntime_session_options=None, openvino_config=None, logging=True)#

Trace a Keras model and convert it into an accelerated module for inference.

Parameters
  • accelerator – The accelerator to use, defaults to None meaning staying in Keras backend. ‘openvino’ and ‘onnxruntime’ are supported for now.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. If accelerator='onnxruntime', input_spec is required. If accelerator='openvino', input_spec is only required when you have a custom Keras model.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

Returns

Model with different acceleration(OpenVINO/ONNX Runtime).

Warning

This function will be deprecated in future release.

Please use bigdl.nano.tf.keras.InferenceOptimizer.trace instead.

class bigdl.nano.tf.keras.Sequential(*args, **kwargs)[source]#

A wrapper class for tf.keras.Sequential adding more functions for BigDL-Nano.

Create a nano Sequential model, having the same arguments with tf.keras.Sequential.

fit(x=None, y=None, batch_size=None, epochs=1, verbose='auto', callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False, num_processes=None, backend='multiprocessing')#

Override tf.keras.Model.fit to add more parameters.

All arguments that already exists in tf.keras.Model.fit has the same sementics with tf.keras.Model.fit.

Additional parameters:

Parameters
  • num_processes – when num_processes is not None, it specifies how many sub-processes to launch to run pseudo-distributed training; when num_processes is None, training will run in the current process.

  • backend – when num_processes is not None, it specifies which backend to use when launching sub-processes to run psedu-distributed training; when num_processes is None, this parameter takes no effect.

quantize(x: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.data.ops.dataset_ops.DatasetV1], y: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray] = None, precision: str = 'int8', accelerator: Optional[str] = None, input_spec=None, metric: Optional[tensorflow.python.keras.metrics.Metric] = None, accuracy_criterion: Optional[dict] = None, approach: str = 'static', method: Optional[str] = None, conf: Optional[str] = None, tuning_strategy: Optional[str] = None, timeout: Optional[int] = None, max_trials: Optional[int] = None, batch: Optional[int] = None, thread_num: Optional[int] = None, inputs: List[str] = None, outputs: List[str] = None, sample_size: int = 100, onnxruntime_session_options=None, openvino_config=None, logging: bool = True)#

Post-training quantization on a keras model.

Parameters
  • x

    Input data which is used for training. It could be:

    1. a Numpy array (or array-like), or a list of arrays (in case the model
    has multiple inputs).

    2. a TensorFlow tensor, or a list of tensors (in case the model has
    multiple inputs).

    3. an unbatched tf.data.Dataset. Should return a tuple of (inputs, targets).

    X will be used as calibration dataset for Post-Training Static Quantization (PTQ), as well as be used for generating input_sample to calculate latency. To avoid data leak during calibration, please use training dataset.

  • y – Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). Its length should be consistent with x. If x is a dataset, y will be ignored (since targets will be obtained from x).

  • precision – Global precision of quantized model, supported type: ‘int8’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in tensorflow.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. If accelerator='onnxruntime', input_spec is required. If accelerator='openvino', or accelerator=None and precision='int8', input_spec is required when you have a custom Keras model.

  • metric – A tensorflow.keras.metrics.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop. accuracy_criterion = {‘relative’: 0.1, ‘higher_is_better’: True} allows relative accuracy loss: 1%. accuracy_criterion = {‘absolute’: 0.99, ‘higher_is_better’:False} means accuracy must be smaller than 0.99.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. Only ‘static’ approach is supported now.

  • method – Method to do quantization. When accelerator=None, supported methods: None. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • batch – Batch size of dataloader for calib_dataset. Defaults to None, if the dataset is not a BatchDataset, batchsize equals to 1. Otherwise, batchsize complies with the dataset._batch_size.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • inputs – A list of input names. Default: None, automatically get names from graph.

  • outputs – A list of output names. Default: None, automatically get names from graph.

  • sample_size – (optional) a int represents how many samples will be used for Post-training Optimization Tools (POT) from OpenVINO toolkit, only valid for accelerator=’openvino’. Default to 100. The larger the value, the more accurate the conversion, the lower the performance degradation, but the longer the time.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

Returns

A TensorflowBaseModel for INC. If there is no model found, return None.

Warning

This function will be deprecated in future release.

Please use bigdl.nano.tf.keras.InferenceOptimizer.quantize instead.

trace(accelerator: Optional[str] = None, input_spec=None, thread_num: Optional[int] = None, onnxruntime_session_options=None, openvino_config=None, logging=True)#

Trace a Keras model and convert it into an accelerated module for inference.

Parameters
  • accelerator – The accelerator to use, defaults to None meaning staying in Keras backend. ‘openvino’ and ‘onnxruntime’ are supported for now.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. If accelerator='onnxruntime', input_spec is required. If accelerator='openvino', input_spec is only required when you have a custom Keras model.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

Returns

Model with different acceleration(OpenVINO/ONNX Runtime).

Warning

This function will be deprecated in future release.

Please use bigdl.nano.tf.keras.InferenceOptimizer.trace instead.

class bigdl.nano.tf.keras.layers.Embedding(input_dim, output_dim, embeddings_initializer='uniform', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None, **kwargs)[source]#

A slightly modified version of tf.keras Embedding layer.

This embedding layer only applies regularizer to the output of the embedding layers, so that the gradient to embeddings is sparse.

Create a slightly modified version of tf.keras Embedding layer.

Parameters
  • input_dim – Integer. Size of the vocabulary, i.e. maximum integer index + 1.

  • output_dim – Integer. Dimension of the dense embedding.

  • embeddings_initializer – Initializer for the embeddings matrix (see keras.initializers).

  • embeddings_regularizer – Applying regularizer directly on embeddings will make the sparse gradient dense and may result in degraded performance. We recommend you to use activity_regularizer.

  • activity_regularizer – Regularizer function applied to the output tensor after looking up the embeddings matrix.

  • embeddings_constraint – Constraint function applied to the embeddings matrix (see keras.constraints).

  • mask_zero – Boolean, whether or not the input value 0 is a special “padding” value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True, then all subsequent layers in the model need to support masking or an exception will throw. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1).

  • input_length – Length of input sequences, when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).

  • kwargs – Keyword arguments passed to tf.keras.layers.Embedding

bigdl.nano.tf.optimizers#

class bigdl.nano.tf.optimizers.SparseAdam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, name='SparseAdam', **kwargs)[source]#

A variant of the Adam optimizer that handles sparse updates more efficiently.

The original Adam algorithm maintains two moving-average accumulators for each trainable variable; the accumulators are updated at every step. In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters. Compared with the original Adam optimizer, it can provide large improvements in model training throughput for some applications.

Create a slightly modified version of tf.keras.optimizers.Adam.

which only update moving-average accumulators for sparse variable indices that appear in the current batch.

Parameters
  • learning_rate – A Tensor, floating point value, or a schedule that is a tf.keras.optimizers.schedules.LearningRateSchedule, or a callable that takes no arguments and returns the actual value to use, The learning rate. Defaults to 0.001.

  • beta_1 – A float value or a constant float tensor, or a callable that takes no arguments and returns the actual value to use. The exponential decay rate for the 1st moment estimates. Defaults to 0.9.

  • beta_2 – A float value or a constant float tensor, or a callable that takes no arguments and returns the actual value to use, The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.

  • epsilon – A small constant for numerical stability. This epsilon is “epsilon hat” in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper. Defaults to 1e-7.

  • amsgrad – Boolean. Currently amsgrad is not supported and it can only set to False.

  • name – Optional name for the operations created when applying gradients. Defaults to “Adam”.

  • kwargs – Keyword arguments. Allowed to be one of “clipnorm” or “clipvalue”. “clipnorm” (float) clips gradients by norm; “clipvalue” (float) clips gradients by value.

bigdl.nano.tf.keras.InferenceOptimizer#

class bigdl.nano.tf.keras.InferenceOptimizer[source]#

InferenceOptimizer for Pytorch/TF Model.

It can be used to accelerate your model’s inference speed with very few code changes.

optimize(model: tensorflow.python.keras.engine.training.Model, x: Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.data.ops.dataset_ops.DatasetV1], y: Optional[Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray]] = None, validation_data: Optional[tensorflow.python.data.ops.dataset_ops.DatasetV1] = None, input_spec=None, batch_size: int = 1, metric: Optional[tensorflow.python.keras.metrics.Metric] = None, direction: str = 'max', thread_num: Optional[int] = None, logging: bool = False, latency_sample_num: int = 100, includes: Optional[List[str]] = None, excludes: Optional[List[str]] = None, output_filename: Optional[str] = None) None[source]#

This function will give all available inference acceleration methods a try and record the latency, accuracy and model instance inside the Optimizer for future usage. All model instance is setting to eval mode.

The available methods are “original”, “openvino_fp32”, “onnxruntime_fp32”, “int8”.

Parameters
  • model – A keras.Model to be optimized

  • x

    Input data which is used for training. It could be:

    1. a Numpy array (or array-like), or a list of arrays (in case the model
    has multiple inputs).

    2. a TensorFlow tensor, or a list of tensors (in case the model has
    multiple inputs).

    3. an unbatched tf.data.Dataset. Should return a tuple of (inputs, targets).

    X will be used as calibration dataset for Post-Training Static Quantization (PTQ), as well as be used for generating input_sample to calculate latency. To avoid data leak during calibration, please use training dataset.

  • y – Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). Its length should be consistent with x. If x is a dataset, y will be ignored (since targets will be obtained from x).

  • validation_data – (optional) An unbatched tf.data.Dataset object for accuracy evaluation. This is only needed when users care about the possible accuracy drop.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. This is only required when you have a custom Keras model (no input/output layer is explicitly defined).

  • metric – (optional) A tensorflow.keras.metrics.Metric object which is used for calculating accuracy.

  • direction – (optional) A string that indicates the higher/lower better for the metric, “min” for the lower the better and “max” for the higher the better. Default value is “max”.

  • thread_num – (optional) An int represents how many threads(cores) is needed for inference. This parameter only controls the usage of thread number in the process of latency calculation as well as later inference process of your obtained accelerated model. In other words, the process of model conversion and optional accuracy calculation won’t be restricted by this parameter. Defaults to None, represents that all cores will be used.

  • logging – whether to log detailed information of model conversion. Default: False.

  • latency_sample_num – (optional) a int represents the number of repetitions to calculate the average latency. The default value is 100.

  • includes – (optional) a list of acceleration methods that will be included in the search. Default to None meaning including all available methods. “original” method will be automatically add to includes.

  • excludes – (optional) a list of acceleration methods that will be excluded from the search. “original” will be ignored in the excludes.

  • output_filename – (optional) a string filename is used to specify the file which the optimized table will be writed. The default is None which means don’t write to file.

static trace(model: tensorflow.python.keras.engine.training.Model, accelerator: Optional[str] = None, input_spec=None, thread_num: Optional[int] = None, device: Optional[str] = 'CPU', onnxruntime_session_options=None, openvino_config=None, logging=True, **kwargs)[source]#

Trace a Keras model and convert it into an accelerated module for inference.

Parameters
  • model – The Keras model to trace.

  • accelerator – The accelerator to use, defaults to None meaning staying in Keras backend. ‘openvino’ and ‘onnxruntime’ are supported for now.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. This is only required when you have a custom Keras model (no input/output layer is explicitly defined).

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • device – (optional) A string represents the device of the inference. Default to ‘CPU’, only valid when accelerator=’openvino’, otherwise will be ignored. ‘CPU’, ‘GPU’ are supported for now.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • **kwargs

    Other extra advanced settings include those be passed to model optimizer function of openvino, only valid when accelerator=’openvino’, otherwise will be ignored. Possible arguments are: mean_values, layout, input, output, et al. For more details about model optimizer, you can see mo –help .

Returns

Model with different acceleration(OpenVINO/ONNX Runtime).

static quantize(model: tensorflow.python.keras.engine.training.Model, x: Optional[Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray, tensorflow.python.data.ops.dataset_ops.DatasetV1]] = None, y: Optional[Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray]] = None, precision: str = 'int8', accelerator: Optional[str] = None, input_spec=None, eval_func: Optional[Callable] = None, metric: Optional[tensorflow.python.keras.metrics.Metric] = None, accuracy_criterion: Optional[dict] = None, approach: str = 'static', method: Optional[str] = None, conf: Optional[str] = None, tuning_strategy: Optional[str] = None, timeout: Optional[int] = None, max_trials: Optional[int] = None, batch: Optional[int] = None, thread_num: Optional[int] = None, device: Optional[str] = 'CPU', custom_objects=None, inputs: Optional[List[str]] = None, outputs: Optional[List[str]] = None, sample_size: int = 100, onnxruntime_session_options=None, openvino_config=None, logging: bool = True, **kwargs)[source]#

Post-training quantization on a keras model.

Parameters
  • model – The Keras model to quantize.

  • x

    Input data which is used for training. It could be:

    1. a Numpy array (or array-like), or a list of arrays (in case the model
    has multiple inputs).

    2. a TensorFlow tensor, or a list of tensors (in case the model has
    multiple inputs).

    3. an unbatched tf.data.Dataset. Should return a tuple of (inputs, targets).

    X will be used as calibration dataset for Post-Training Static Quantization (PTQ). To avoid data leak during calibration, please use training dataset. only valid when precision=’int8’, otherwise will be ignored.

  • y – Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). Its length should be consistent with x. If x is a dataset, y will be ignored (since targets will be obtained from x).

  • precision – Global precision of quantized model, supported type: ‘int8’, ‘bf16’, ‘fp16’, defaults to ‘int8’. Note that, mixed bf16 precision only works for keras.Model with explict input and output definition(e.g., model = keras.Model(inputs=inputs, outputs=outputs)).

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in tensorflow.

  • input_spec – (optional) A (tuple or list of) tf.TensorSpec defining the shape/dtype of the input. This is only required when you have a custom Keras model (no input/output layer is explicitly defined).

  • eval_func – A evaluation function which only accepts model as input and return evaluation value. This parameter provides a higher degree of freedom than using eval_loader and metric. Default to None meaning no performance tuning, but it would be better give an evaluation function to get better quantization performance.

  • metric – A tensorflow.keras.metrics.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop, defaults to None meaning no accuracy control. accuracy_criterion = {‘absolute’:0.99, ‘higher_is_better’:False} means accuracy loss must be smaller than 0.99. For example, if higher_is_better is True, then this requires original metric value subtract current metric value be smaller than 0.99. For inc 1.x, this value must be set to [0, 1), for inc 2.x, there is no limit. accuracy_criterion = {‘relative’:0.1, ‘higher_is_better’:True} allows relative accuracy loss: 10%.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. Only ‘static’ approach is supported now.

  • method – Method to do quantization. When accelerator=None, supported methods: None. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • batch – Batch size of dataloader for calib_dataset. Defaults to None, if the dataset is not a BatchDataset, batchsize equals to 1. Otherwise, batchsize complies with the dataset._batch_size.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • device – (optional) A string represents the device of the inference. Default to ‘CPU’, only valid when accelerator=’openvino’, otherwise will be ignored. ‘CPU’, ‘GPU’ and ‘VPUX’ are supported for now.

  • custom_objects – Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization. Only may be required when quantizing bf16 model and accelerator is None.

  • inputs – A list of input names. Default: None, automatically get names from graph.

  • outputs – A list of output names. Default: None, automatically get names from graph.

  • sample_size – (optional) a int represents how many samples will be used for Post-training Optimization Tools (POT) from OpenVINO toolkit, only valid for accelerator=’openvino’. Default to 100. The larger the value, the more accurate the conversion, the lower the performance degradation, but the longer the time.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • **kwargs

    Other extra advanced settings include: 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. Possible arguments are: input_names, output_names, opset_version, et al. For more details, please refer https://pytorch.org/docs/stable/onnx.html#torch.onnx.export. 2. those be passed to model optimizer function of openvino, only valid when accelerator=’openvino’, otherwise will be ignored. Possible arguments are: mean_values, layout, input, output, et al. For more details about model optimizer, you can see mo –help . If you want to quantize with openvino on VPUX device, you must specify mean_value for model optimizer function. Here mean_value represents mean values to be used for the input image per channel. Values to be provided in the (R,G,B) or [R,G,B] format. Can be defined for desired input of the model, for example: “–mean_values data[255,255,255],info[255,255,255]”. The exact meaning and order of channels depend on how the original model was trained.

Returns

A TensorflowBaseModel. If there is no model found, return None.

static save(model: tensorflow.python.keras.engine.training.Model, path)[source]#

Save the model to local file.

Parameters
  • model – Any model of keras.Model, including all models accelareted by InferenceOptimizer.trace/InferenceOptimizer.quantize.

  • path – Path to saved model. Path should be a directory.

static load(path, model: Optional[tensorflow.python.keras.engine.training.Model] = None, device=None, custom_objects=None)[source]#

Load a model from local.

Parameters
  • path – Path to model to be loaded. Path should be a directory.

  • model – Required FP32 model to load pytorch model, it is needed if: 1. you accelerate the model with accelerator=None by InferenceOptimizer.trace()/InferenceOptimizer.quantize(). 2. you accelerate the model with InferenceOptimizer.optimize() and get_model()/get_best_model(), and the best method or the method you specify don’t contain accelerator ‘onnxruntime’/’openvino’/’jit’. If you are not sure what optimization method is used, we recommend that you always pass in the original model for this case. 3. you want to the loaded model contains the attributes of original model.

  • device – A string represents the device of the inference. Default to None. Only valid for openvino model, otherwise will be ignored.

  • custom_objects – Same to custom_objects parameter of tf.keras.models.load_model, only may be required when loading bf16 model.

Returns

Model with different acceleration(None/OpenVINO/ONNX Runtime) or precision(FP32/FP16/BF16/INT8).

get_best_model(accelerator: Optional[str] = None, precision: Optional[str] = None, use_ipex: Optional[bool] = None, accuracy_criterion: Optional[float] = None)#

According to results of optimize, obtain the model with minimum latency under specific restrictions or without restrictions.

Parameters
  • accelerator – (optional) Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, ‘jit’, defaults to None. If not None, then will only find the model with this specific accelerator.

  • precision – (optional) Supported type: ‘int8’, ‘bf16’, and ‘fp32’. Defaults to None which represents no precision limit. If not None, then will only find the model with this specific precision.

  • use_ipex – (optional) if not None, then will only find the model with this specific ipex setting. This is only effective for pytorch model.

  • accuracy_criterion – (optional) a float represents tolerable accuracy drop percentage, defaults to None meaning no accuracy control.

Returns

best model, corresponding acceleration option

get_model(method_name: str)#

According to results of optimize, obtain the model with method_name.

The available methods are “original”, “fp32_channels_last”, “fp32_ipex”, “fp32_ipex_channels_last”, “bf16”, “bf16_channels_last”, “bf16_ipex”, “bf16_ipex_channels_last”, “static_int8”, “static_int8_ipex”, “jit_fp32”, “jit_fp32_channels_last”, “jit_bf16”, “jit_bf16_channels_last”, “jit_fp32_ipex”, “jit_fp32_ipex_channels_last”, “jit_bf16_ipex”, “jit_bf16_ipex_channels_last”, “jit_int8”, “jit_int8_channels_last”, “openvino_fp32”, “openvino_int8”, “onnxruntime_fp32”, “onnxruntime_int8_qlinear” and “onnxruntime_int8_integer”.

Parameters

method_name – (optional) Obtain specific model according to method_name.

Returns

Model with different acceleration.

summary()#

Print format string representation for optimization result.

Patch API#

bigdl.nano.tf.patch_tensorflow(precision='float32')[source]#

patch_tensorflow is used to patch optimized tensorflow classes to replace original ones.

Meanwhile, set precision as global dtype policy.

Optimized classes include:

1. tf.keras.Model/keras.Model -> bigdl.nano.tf.keras.Model
2. tf.keras.Sequential/keras.Sequential -> bigdl.nano.tf.keras.Sequential
3. tf.keras.layers.Embedding/keras.layers.Embedding -> bigdl.nano.tf.keras.layers.Embedding
4. tf.optimizers.Adam -> bigdl.nano.tf.optimizers.SparseAdam
Parameters

precision – str, specify the compute and variable dtypes, select from 'float32' and 'mixed_bfloat16'. Once 'float32' is set, both compute and variable dtypes will be float32. When 'mixed_bfloat16' is set, the compute dtype is bfloat16 and the variable dtype is float32. Default to be 'float32'.

bigdl.nano.tf.unpatch_tensorflow(precision='float32')[source]#

unpatch_tensorflow is used to unpatch optimized tensorflow classes to original ones.

Meanwhile, set precision as global dtype policy.

Parameters

precision – str, specify the compute and variable dtypes, select from 'float32' and 'mixed_bfloat16'. Once 'float32' is set, both compute and variable dtypes will be float32. When 'mixed_bfloat16' is set, the compute dtype is bfloat16 and the variable dtype is float32. Default to be 'float32'.

bigdl.nano.tf.keras.nano_bf16(func)[source]#

A decorator to realize mixed precision on customized training loop.