Nano PyTorch API#

bigdl.nano.pytorch.Trainer#

class bigdl.nano.pytorch.Trainer(*args: Any, **kwargs: Any)[source]#

Trainer for BigDL-Nano pytorch.

This Trainer extends PyTorch Lightning Trainer by adding various options to accelerate pytorch training.

A pytorch lightning trainer that uses bigdl-nano optimization.

Parameters
  • num_processes – number of processes in distributed training. default: 1.

  • use_ipex – whether we use ipex as accelerator for trainer. default: False.

  • distributed_backend – use which backend in distributed mode, defaults to 'subprocess', now avaiable backends are 'spawn', 'subprocess' and 'ray'

  • process_group_backend – use which process group backend in distributed mode, defaults to None, means using 'gloo' with CPU, while using 'nccl' with GPU, now avaiable backends are None and 'ccl'.

  • cpu_for_each_process – A list of length num_processes, each containing a list of indices of cpus each process will be using. default: None, and the cpu will be automatically and evenly distributed among processes.

  • channels_last – whether convert input to channels last memory formats, defaults to False.

  • auto_lr – whether to scale the learning rate linearly by num_processes times. Defaults to True. A dict with warmup_epochs as key is also accepted to control the number of epochs needed for the learning rate to be scaled by num_processes times. If auto_lr=Ture, warmup_epochs will by default be max_epochs // 10. If num_processes=1 or other lr_scheduler is set, auto_lr will be ignored.

  • precision – Double precision (64), full precision (32), half precision (16) or bfloat16 precision ('bf16'), defaults to 32. Enable ipex bfloat16 weight prepack when use_ipex=True and precision='bf16'

static compile(model: torch.nn.modules.module.Module, loss: Optional[torch.nn.modules.loss._Loss] = None, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, scheduler: Optional[torch.optim.lr_scheduler._LRScheduler] = None, metrics: Optional[List[torchmetrics.metric.Metric]] = None)[source]#

Construct a pytorch-lightning model.

If model is already a pytorch-lightning model, return model. If model is pytorch model, construct a new pytorch-lightning module with model, loss and optimizer.

Parameters
  • model – A model instance.

  • loss – Loss to construct pytorch-lightning model. Should be None if model is instance of pl.LightningModule.

  • optimizer – Optimizer to construct pytorch-lightning model Should be None. if model is instance of pl.LightningModule.

  • metrics – A list of torchmetrics to validate/test performance.

Returns

A LightningModule object.

search(model, resume: bool = False, target_metric=None, mode: str = 'best', n_parallels=1, acceleration=False, input_sample=None, **kwargs)[source]#

Run HPO search. It will be called in Trainer.search().

Parameters
  • model – The model to be searched. It should be an auto model.

  • resume – whether to resume the previous or start a new one, defaults to False.

  • target_metric – the object metric to optimize, defaults to None.

  • mode – use last epoch’s result as trial’s score or use best epoch’s. defaults to ‘best’, you can change it to ‘last’.

  • n_parallels – the number of parallel processes for running trials.

  • acceleration – Whether to automatically consider the model after inference acceleration in the search process. It will only take effect if target_metric contains “latency”. Default value is False.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

Returns

the model with study meta info attached.

search_summary()[source]#

Retrive a summary of trials.

Returns

A summary of all the trials. Currently the entire study is returned to allow more flexibility for further analysis and visualization.

static trace(model: torch.nn.modules.module.Module, input_sample=None, accelerator: str = None, use_ipex: bool = False, thread_num: int = None, onnxruntime_session_options=None, logging: bool = True, **export_kwargs)[source]#

Trace a pytorch model and convert it into an accelerated module for inference.

For example, this function returns a PytorchOpenVINOModel when accelerator==’openvino’.

Parameters
  • model – An torch.nn.Module model, including pl.LightningModule.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

  • accelerator – The accelerator to use, defaults to None meaning staying in Pytorch backend. ‘openvino’, ‘onnxruntime’ and ‘jit’ are supported for now.

  • use_ipex – whether we use ipex as accelerator for inferencing. default: False.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. default: True.

  • **kwargs

    other extra advanced settings include 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. 2. if channels_last is set and use_ipex=True, we will transform the data to be channels last according to the setting. Defaultly, channels_last will be set to True if use_ipex=True.

Returns

Model with different acceleration.

Warning

bigdl.nano.pytorch.Trainer.trace will be deprecated in future release.

Please use bigdl.nano.pytorch.InferenceOptimizer.trace instead.

static quantize(model: torch.nn.modules.module.Module, precision: str = 'int8', accelerator: str = None, use_ipex: bool = False, calib_dataloader: torch.utils.data.dataloader.DataLoader = None, metric: torchmetrics.metric.Metric = None, accuracy_criterion: dict = None, approach: str = 'static', method: str = None, conf: str = None, tuning_strategy: str = None, timeout: int = None, max_trials: int = None, input_sample=None, thread_num: int = None, onnxruntime_session_options=None, logging: bool = True, **export_kwargs)[source]#

Calibrate a Pytorch-Lightning model for post-training quantization.

Parameters
  • model – A model to be quantized. Model type should be an instance of nn.Module.

  • precision – Global precision of quantized model, supported type: ‘int8’, ‘bf16’, ‘fp16’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, defaults to None. None means staying in pytorch.

  • calib_dataloader – A torch.utils.data.dataloader.DataLoader object for calibration. Required for static quantization. It’s also used as validation dataloader.

  • metric – A torchmetrics.metric.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop, defaults to None meaning no accuracy control. accuracy_criterion = {‘relative’: 0.1, ‘higher_is_better’: True} allows relative accuracy loss: 1%. accuracy_criterion = {‘absolute’: 0.99, ‘higher_is_better’:False} means accuracy must be smaller than 0.99.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. OpenVINO supports static mode only.

  • method – Method to do quantization. When accelerator=None, supported methods: ‘fx’, ‘eager’, ‘ipex’, defaults to ‘fx’. If you don’t use ipex, suggest using ‘fx’ which executes automatic optimizations like fusion. For more information, please refer to https://pytorch.org/docs/stable/quantization.html#eager-mode-quantization. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • input_sample – An input example to convert pytorch model into ONNX/OpenVINO.

  • thread_num – (optional) a int represents how many threads(cores) is needed for inference, only valid for accelerator=’onnxruntime’ or accelerator=’openvino’.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. default: True.

  • **export_kwargs

    will be passed to torch.onnx.export function.

Returns

A accelerated Pytorch-Lightning Model if quantization is sucessful.

Warning

bigdl.nano.pytorch.Trainer.quantize will be deprecated in future release.

Please use bigdl.nano.pytorch.InferenceOptimizer.quantize instead.

static save(model: torch.nn.modules.module.Module, path)[source]#

Save the model to local file.

Parameters
  • model – Any model of torch.nn.Module, including all models accelareted by Trainer.trace/Trainer.quantize.

  • path – Path to saved model. Path should be a directory.

static load(path, model: Optional[torch.nn.modules.module.Module] = None, input_sample=None, inplace=False, device=None)[source]#

Load a model from local.

Parameters
  • path – Path to model to be loaded. Path should be a directory.

  • model – Required FP32 model to load pytorch model, it is needed if: 1. you accelerate the model with accelerator=None by InferenceOptimizer.trace()/InferenceOptimizer.quantize(). 2. you accelerate the model with InferenceOptimizer.optimize() and get_model()/get_best_model(), and the best method or the method you specify don’t contain accelerator ‘onnxruntime’/’openvino’/’jit’. If you are not sure what optimization method is used, we recommend that you always pass in the original model for this case. 3. you want to the loaded model contains the attributes of original model.

  • input_sample – Input sample for your model, could be a Tensor or a tuple. Only valid for inc ipex quantization model, otherwise will be ignored.

  • inplace – whether to perform inplace optimization. Default: False.

  • device – A string represents the device of the inference. Default to None. Only valid for openvino model, otherwise will be ignored.

Returns

Model with different acceleration(None/OpenVINO/ONNX Runtime/JIT) or precision(FP32/FP16/BF16/INT8).

bigdl.nano.pytorch.InferenceOptimizer#

class bigdl.nano.pytorch.InferenceOptimizer[source]#

InferenceOptimizer for Pytorch/TF Model.

It can be used to accelerate your model’s inference speed with very few code changes.

optimize(model: torch.nn.modules.module.Module, training_data: Union[torch.utils.data.dataloader.DataLoader, torch.Tensor, Tuple[torch.Tensor]], validation_data: Optional[Union[torch.utils.data.dataloader.DataLoader, torch.Tensor, Tuple[torch.Tensor]]] = None, input_sample: Optional[Union[torch.Tensor, Dict, Tuple[torch.Tensor]]] = None, metric: Optional[Callable] = None, direction: str = 'max', thread_num: Optional[int] = None, accelerator: Optional[Tuple[str]] = None, precision: Optional[Tuple[str]] = None, use_ipex: Optional[bool] = None, jit_strict: Optional[bool] = True, enable_onednn: Optional[bool] = False, search_mode: str = 'default', dynamic_axes: Union[bool, dict] = True, logging: bool = False, output_tensors: bool = True, latency_sample_num: int = 100, includes: Optional[List[str]] = None, excludes: Optional[List[str]] = None, output_filename: Optional[str] = None, no_cache: bool = False) None[source]#

This function will give all available inference acceleration methods a try and record the latency, accuracy and model instance inside the Optimizer for future usage. All model instance is setting to eval mode.

The available methods are “original”, “fp32_channels_last”, “fp32_ipex”, “fp32_ipex_channels_last”, “bf16”, “bf16_channels_last”, “bf16_ipex”, “bf16_ipex_channels_last”, “static_int8”, “static_int8_ipex”, “jit_fp32”, “jit_fp32_channels_last”, “jit_bf16”, “jit_bf16_channels_last”, “jit_fp32_ipex”, “jit_fp32_ipex_channels_last”, “jit_bf16_ipex”, “jit_bf16_ipex_channels_last”, “jit_int8”, “jit_int8_channels_last”, “openvino_fp32”, “openvino_int8”, “onnxruntime_fp32”, “onnxruntime_int8_qlinear” and “onnxruntime_int8_integer”.

Parameters
  • model – A torch.nn.Module to be optimized

  • training_data

    training_data support following formats:

    1. a torch.utils.data.dataloader.DataLoader object for training dataset.
    Users should be careful with this parameter since this dataloader
    might be exposed to the model, which causing data leak. The
    batch_size of this dataloader is important as well, users may
    want to set it to the same batch size you may want to use the model
    in real deploy environment. E.g. batch size should be set to 1
    if you would like to use the accelerated model in an online service.

    Each element in the DataLoader can be one of the following:
    a. a single Tensor or dict of Tensors
    b. a tuple:
    b1: if the length is 1, the first element will be treated as input
    to the model
    b2: if the length is 2, the first element will be treated as input
    to the model, with the sencond element treated as label.
    if the input to the model is a tuple, it will be unpacked as
    multiple inputs.
    b3: if the length is larger than 2, the first n elements as input
    to the model, with n being the argument lenth to the model.forward
    and the rest will be treated as label

    2. a single element of the Dataloader specified above

  • validation_data

    (optional) validation_data is only needed when users care

    about the possible accuracy drop. It support following formats:

    1. a torch.utils.data.dataloader.DataLoader object for accuracy evaluation.

    Each element in the DataLoader should be a tuple as least size of two:
    a: if the length is 2, the first element will be treated as input
    to the model, with the sencond element treated as label
    b: if the length is larger than 2, the first n elements as input
    to the model, with n being the argument lenth to the model.forward
    and the rest will be treated as label

    2. a single element of the Dataloader specified above

  • input_sample – (optional) A set of inputs for trace, defaults to None. In most cases, you don’t need specify this parameter, it will be obtained from training_data. You have to specify this parameter only if the forward function of your model contains some kwargs like def forward(self, x1, x2, x3=1).

  • metric

    (optional) A callable object which is used for calculating accuracy. It supports two kinds of callable object:

    1. A torchmetrics.Metric object or similar callable object which takes
    prediction and target then returns an accuracy value in this calling
    method metric(pred, target). This requires data in validation_data
    is composed of (input_data, target).

    2. A callable object that takes model and validation_data (if
    validation_data is not None) as input, and returns an accuracy value in
    this calling method metric(model, data_loader) (or metric(model) if
    validation_data is None). Note that there is no need to call `with
    InferenceOptimizer.get_context()` in this object.

  • direction – (optional) A string that indicates the higher/lower better for the metric, “min” for the lower the better and “max” for the higher the better. Default value is “max”.

  • thread_num – (optional) An int represents how many threads(cores) is needed for inference. This parameter only controls the usage of thread number in the process of latency calculation as well as later inference process of your obtained accelerated model. In other words, the process of model conversion and optional accuracy calculation won’t be restricted by this parameter. Defaults to None, represents that all cores will be used.

  • accelerator – (optional) A string tuple that specifies the accelerators to search. The optional accelerators are: None, ‘openvino’, ‘onnxruntime’, ‘jit’. Defaults to None which represents there is no restriction on accelerators. If not None, then will only traverse corresponding methods whose accelerator falls within the specified accelerator tuple.

  • precision – (optional) A string tuple that specifies the precision to search. The optional precision are: ‘int8’, ‘bf16’, and ‘fp32’. Defaults to None which represents no precision limit. If not None, then will only traverse corresponding methods whose precision falls within the specified precision tuple.

  • use_ipex – (optional) if not None, then will only try methods with/without this specific ipex setting.

  • jit_strict – Whether recording your mutable container types. This parameter will be passed to torch.jit.trace. if accelerator != 'jit' or jit_method='script', it will be ignored. Default to True.

  • enable_onednn – Whether to use PyTorch JIT graph fuser based on oneDNN Graph API, which provides a flexible API for aggressive fusion. Default to False, only valid when accelerator=’jit’, otherwise will be ignored. For more details, please refer pytorch/pytorch/tree/master/torch/csrc/jit/codegen/ onednn#pytorch—onednn-graph-api-bridge.

  • search_mode

    Here are three modes for optimization:

    1. default: This mode only traverses a subset of all combinations. This subset
    is a collection of methods that we select based on experience and think have
    better acceleration effect in general. This mode allows you to quickly obtain a
    good acceleration method, but it is not necessarily the global optimal. Default
    to this mode if you don’t specify accelerator/precision/use_ipex.

    2. all: This mode will traverse all possible combinations, which can ensure
    find the global optimization, but it will take a long time.

    3. grid: If you have specified accelerator/precision/use_ipex, the default is
    grid mode. We will sort and combine according to the value you specified to
    get the search range.

  • dynamic_axes

    dict or boolean, default to True. By default the exported onnx model will have the first dim of each Tensor input as a dynamic batch_size. If dynamic_axes=False, the exported model will have the shapes of all input and output tensors set to exactly match those given in input_sample. To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_axes to a dict with schema:

    KEY (str): an input or output name. Each name must also be provided
    in input_names or output_names.

    VALUE (dict or list): If a dict, keys are axis indices and values are
    axis names. If a list, each element is an axis index.

    If accelerator != ‘openvino’/’onnxruntime’, it will be ignored.

  • logging – whether to log detailed information of model conversion. Default: False.

  • output_tensors – boolean, default to True and output of the model will be Tensors, only valid when accelerator=’onnxruntime’ or accelerator=’openvino’, otherwise will be ignored. If output_tensors=False, output of the export model will be ndarray.

  • latency_sample_num – (optional) a int represents the number of repetitions to calculate the average latency. The default value is 100.

  • includes – (optional) a list of acceleration methods that will be included in the search. Default to None meaning including all available methods. “original” method will be automatically add to includes.

  • excludes – (optional) a list of acceleration methods that will be excluded from the search. “original” will be ignored in the excludes.

  • output_filename – (optional) a string filename is used to specify the file which the optimized table will be writed. The default is None which means don’t write to file.

  • no_cache – if set True, calculate average latency by iterating all the samples from the provided dataloader until reaching the latency_sample_num. Default set to be False, meaning always loading one single sample from cache to test latency.

static quantize(model: torch.nn.modules.module.Module, precision: str = 'int8', accelerator: Optional[str] = None, use_ipex: bool = False, calib_data: Optional[Union[torch.utils.data.dataloader.DataLoader, torch.Tensor, Tuple[torch.Tensor]]] = None, calib_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_func: Optional[Callable] = None, metric: Optional[torchmetrics.metric.Metric] = None, accuracy_criterion: Optional[dict] = None, approach: str = 'static', method: Optional[str] = None, conf: Optional[str] = None, tuning_strategy: Optional[str] = None, timeout: Optional[int] = None, max_trials: Optional[int] = None, input_sample=None, channels_last: bool = False, thread_num: Optional[int] = None, device: Optional[str] = 'CPU', onnxruntime_session_options=None, openvino_config=None, simplification: bool = True, jit_strict: bool = True, jit_method: Optional[str] = None, dynamic_axes: Union[bool, dict] = True, sample_size: int = 100, logging: bool = True, inplace: bool = False, weights_prepack: Optional[bool] = None, enable_onednn: bool = False, q_config=None, output_tensors: bool = True, example_kwarg_inputs=None, **kwargs)[source]#

Calibrate a torch.nn.Module for post-training quantization.

Parameters
  • model – A model to be quantized. Model type should be an instance of torch.nn.Module.

  • precision – Global precision of quantized model, supported type: ‘int8’, ‘bf16’, ‘fp16’, defaults to ‘int8’.

  • accelerator – Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, ‘jit’, defaults to None. None means staying in pytorch.

  • use_ipex – Whether we use ipex as accelerator for inference. If precision != bf16, it will be ignored. Default: False.

  • calib_data

    Calibration data is required for static quantization. It’s also used as validation dataloader. calib_data support following formats:

    1. a torch.utils.data.dataloader.DataLoader object for training.

    2. a single torch.Tensor used for training, this case is used
    to accept single sample input x.

    3. a tuple of torch.Tensor which used for training, this case is
    used to accept single sample input (x, y) or (x1, x2) et al.

  • calib_dataloader

    A torch.utils.data.dataloader.DataLoader object for calibration.

    Required for static quantization. It’s also used as validation dataloader.

    Warning

    calib_dataloader will be deprecated in future release.

    Please use calib_data instead.

  • eval_func – A evaluation function which only accepts model as input and return evaluation value. This parameter provides a higher degree of freedom than using eval_loader and metric. Default to None meaning no performance tuning, but it would be better give an evaluation function to get better quantization performance.

  • metric – A torchmetrics.metric.Metric object for evaluation.

  • accuracy_criterion – Tolerable accuracy drop, defaults to None meaning no accuracy control. accuracy_criterion = {‘absolute’:0.99, ‘higher_is_better’:False} means accuracy loss must be smaller than 0.99. For example, if higher_is_better is True, then this requires original metric value subtract current metric value be smaller than 0.99. For inc 1.x, this value must be set to [0, 1), for inc 2.x, there is no limit. accuracy_criterion = {‘relative’:0.1, ‘higher_is_better’:True} allows relative accuracy loss: 10%.

  • approach – ‘static’ or ‘dynamic’. ‘static’: post_training_static_quant, ‘dynamic’: post_training_dynamic_quant. Default: ‘static’. OpenVINO supports static mode only.

  • method – Method to do quantization. When accelerator=None, supported methods: ‘fx’, ‘eager’, ‘ipex’, defaults to ‘fx’. If you don’t use ipex, suggest using ‘fx’ which executes automatic optimizations like fusion. For more information, please refer to https://pytorch.org/docs/stable/quantization.html#eager-mode-quantization. When accelerator=’onnxruntime’, supported methods: ‘qlinear’, ‘integer’, defaults to ‘qlinear’. Suggest ‘qlinear’ for lower accuracy drop if using static quantization. More details in https://onnxruntime.ai/docs/performance/quantization.html. This argument doesn’t take effect for OpenVINO, don’t change it for OpenVINO.

  • conf – A path to conf yaml file for quantization. Default: None, using default config.

  • tuning_strategy – ‘bayesian’, ‘basic’, ‘mse’, ‘sigopt’. Default: ‘bayesian’.

  • timeout – Tuning timeout (seconds). Default: None, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default: None, which means no tuning. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.

  • input_sample – An input example to convert pytorch model into ONNX/OpenVINO/JIT.

  • channels_last – Whether use channels last memory format, i.e. NHWC (batch size, height, width, channels), as an alternative way to store tensors in classic/contiguous NCHW order, only valid when precision=’bf16’, otherwise will be ignored. This setting only works for 4-dim Tensor. Default: False.

  • thread_num – (optional) An int represents how many threads(cores) is needed for inference. This parameter only controls the usage of thread number in later inference process of your obtained accelerated model. In other words, the process of model conversion won’t be restricted by this parameter.

  • device – (optional) A string represents the device of the inference. Default to ‘CPU’, only valid when accelerator=’openvino’, otherwise will be ignored. ‘CPU’, ‘GPU’ and ‘VPUX’ are supported for now.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • simplification – whether we use onnxsim to simplify the ONNX model, only valid when accelerator=’onnxruntime’, otherwise will be ignored. If this option is set to True, new dependency ‘onnxsim’ need to be installed.

  • jit_strict – Whether recording your mutable container types. This parameter will be passed to torch.jit.trace. if accelerator != 'jit' or jit_method='script', it will be ignored. Default to True.

  • jit_method – Whether to use jit.trace or jit.script to convert a model to TorchScript. Accepted values are 'trace', 'script', and None. Default to be None meaning the try-except logic to use jit.trace or jit.script. If accelerator != 'jit', this parameter will be ignored.

  • dynamic_axes

    dict or boolean, default to True. By default the exported onnx model will have the first dim of each Tensor input as a dynamic batch_size. If dynamic_axes=False, the exported model will have the shapes of all input and output tensors set to exactly match those given in input_sample. To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_axes to a dict with schema:

    KEY (str): an input or output name. Each name must also be provided
    in input_names or output_names.

    VALUE (dict or list): If a dict, keys are axis indices and values
    are axis names. If a list, each element is an axis index.

    If accelerator != ‘openvino’/’onnxruntime’, it will be ignored.

  • sample_size – (optional) a int represents how many samples will be used for Post-training Optimization Tools (POT) from OpenVINO toolkit, only valid for accelerator=’openvino’. Default to 100. The larger the value, the more accurate the conversion, the lower the performance degradation, but the longer the time.

  • logging – whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • inplace – whether to perform inplace optimization. Default: False.

  • weights_prepack – Whether to perform weight prepack for convolution and linear to avoid oneDNN weights reorder. The default value is None. Explicitly setting this knob overwrites the configuration set by level knob. Only valid when use_ipex=True, otherwise will be ignored. You can try to reduce the occupied memory size by setting this parameter to False.

  • enable_onednn – Whether to use PyTorch JIT graph fuser based on oneDNN Graph API, which provides a flexible API for aggressive fusion. Default to False, only valid when accelerator=’jit’, otherwise will be ignored. For more details, please refer pytorch/pytorch/tree/master/torch/csrc/jit/codegen/ onednn#pytorch—onednn-graph-api-bridge.

  • q_config – Qconfig (https://pytorch.org/docs/stable/generated/torch.quantization. qconfig.QConfig.html#qconfig) describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively. Note that QConfig needs to contain observer classes (like MinMaxObserver) or a callable that returns instances on invocation, not the concrete observer instances themselves. Quantization preparation function will instantiate observers multiple times for each of the layers. This parameter only works for native ipex and jit quantization with int8 precision. When accelerator=’jit’, we also support and recommend to pass a QConfigMapping instead of single Qconfig for customized quantization. QConfigMapping (https://pytorch.org/docs/stable/generated/torch.ao. quantization.qconfig_mapping.QConfigMapping.html#qconfigmapping) is a collection of quantization configurations, user can set the qconfig for each operator (torch op calls, functional calls, module calls) in the model through qconfig_mapping.

  • output_tensors – boolean, default to True and output of the model will be Tensors, only valid when accelerator=’onnxruntime’ or accelerator=’openvino’, otherwise will be ignored. If output_tensors=False, output of the export model will be ndarray.

  • example_kwarg_inputs – a pack of keyword arguments of example inputs that will be passed to torch.jit.trace. Default: None. Either this argument or input_sample should be specified. The dict will be unpacking by the arguments name of the traced function. Only valid when accelerator=’jit’ and torch>=2.0, otherwise will be ignored.

  • **kwargs

    Other extra advanced settings include: 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. Possible arguments are: input_names, output_names, opset_version, et al. For more details, please refer https://pytorch.org/docs/stable/onnx.html#torch.onnx.export. 2. those be passed to model optimizer function of openvino, only valid when accelerator=’openvino’, otherwise will be ignored. Possible arguments are: mean_values, layout, input, output, et al. For more details about model optimizer, you can see mo –help . If you want to quantize with openvino on VPUX device, you must specify mean_value for model optimizer function. Here mean_value represents mean values to be used for the input image per channel. Values to be provided in the (R,G,B) or [R,G,B] format. Can be defined for desired input of the model, for example: “–mean_values data[255,255,255],info[255,255,255]”. The exact meaning and order of channels depend on how the original model was trained.

Returns

A accelerated torch.nn.Module if quantization is successful.

static trace(model: torch.nn.modules.module.Module, input_sample=None, accelerator: Optional[str] = None, use_ipex: bool = False, channels_last: bool = False, thread_num: Optional[int] = None, device: Optional[str] = 'CPU', onnxruntime_session_options=None, openvino_config=None, simplification: bool = True, jit_strict: bool = True, jit_method: Optional[str] = None, dynamic_axes: Union[bool, dict] = True, logging: bool = True, inplace: bool = False, weights_prepack: Optional[bool] = None, enable_onednn: bool = False, output_tensors: bool = True, strict_check: bool = True, example_kwarg_inputs=None, **kwargs)[source]#

Trace a torch.nn.Module and convert it into an accelerated module for inference.

For example, this function returns a PytorchOpenVINOModel when accelerator==’openvino’.

Parameters
  • model – A torch.nn.Module model, including pl.LightningModule.

  • input_sample – A set of inputs for trace, defaults to None if you have trace before or model is a LightningModule with any dataloader attached.

  • accelerator – The accelerator to use, defaults to None meaning staying in Pytorch backend. ‘openvino’, ‘onnxruntime’ and ‘jit’ are supported for now.

  • use_ipex – Whether we use ipex as accelerator for inferencing. Only valid when accelerator=’jit’/None, otherwise will be ignored. Default: False.

  • channels_last – Whether use channels last memory format, i.e. NHWC (batch size, height, width, channels), as an alternative way to store tensors in classic/contiguous NCHW order. This setting only works for 4-dim Tensor. Default: False.

  • thread_num – (optional) An int represents how many threads(cores) is needed for inference. This parameter only controls the usage of thread number in later inference process of your obtained accelerated model. In other words, the process of model conversion won’t be restricted by this parameter.

  • device – (optional) A string represents the device of the inference. Default to ‘CPU’, vaild choices are ‘CPU’/’GPU’. ‘GPU’ is only valid when accelerator=”openvino”/None. IPEX will be forcely used if accelerator=None.

  • onnxruntime_session_options – The session option for onnxruntime, only valid when accelerator=’onnxruntime’, otherwise will be ignored.

  • openvino_config – The config to be inputted in core.compile_model. Only valid when accelerator=’openvino’, otherwise will be ignored.

  • simplification – Whether we use onnxsim to simplify the ONNX model, only valid when accelerator=’onnxruntime’, otherwise will be ignored. If this option is set to True, new dependency ‘onnxsim’ need to be installed.

  • jit_strict – Whether recording your mutable container types. This parameter will be passed to torch.jit.trace. if accelerator != 'jit' or jit_method='script', it will be ignored. Default to True.

  • jit_method – Whether to use jit.trace or jit.script to convert a model to TorchScript. Accepected values are 'trace', 'script', and None. Default to be None meaning the try-except logic to use jit.trace or jit.script. If accelerator != 'jit', this parameter will be ignored.

  • dynamic_axes

    dict or boolean, default to True. By default the exported onnx model will have the first dim of each Tensor input as a dynamic batch_size. If dynamic_axes=False, the exported model will have the shapes of all input and output tensors set to exactly match those given in input_sample. To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_axes to a dict with schema:

    KEY (str): an input or output name. Each name must also be provided
    in input_names or output_names.

    VALUE (dict or list): If a dict, keys are axis indices and values
    are axis names. If a list, each element is an axis index.

    If accelerator != ‘openvino’/’onnxruntime’, it will be ignored.

  • logging – Whether to log detailed information of model conversion, only valid when accelerator=’openvino’, otherwise will be ignored. Default: True.

  • inplace – whether to perform inplace optimization. Default: False.

  • weights_prepack – Whether to perform weight prepack for convolution and linear to avoid oneDNN weights reorder. The default value is None. Explicitly setting this knob overwrites the configuration set by level knob. Only valid when use_ipex=True, otherwise will be ignored. You can try to reduce the occupied memory size by setting this parameter to False.

  • enable_onednn – Whether to use PyTorch JIT graph fuser based on oneDNN Graph API, which provides a flexible API for aggressive fusion. Default to False, only valid when accelerator=’jit’, otherwise will be ignored. For more details, please refer pytorch/ pytorch/tree/master/torch/csrc/jit/codegen/ onednn#pytorch—onednn-graph-api-bridge.

  • output_tensors – boolean, default to True and output of the model will be Tensors, only valid when accelerator=’onnxruntime’ or accelerator=’openvino’, otherwise will be ignored. If output_tensors=False, output of the export model will be ndarray.

  • strict_check – some checking in trace is non-trivial while not critical for the optimization (e.g., if the model is a nn.Module or its subclass). This param helps to eliminate the not critical checking, which may enable more models to be optimized while may bring some strange error message. Default to True.

  • example_kwarg_inputs – a pack of keyword arguments of example inputs that will be passed to torch.jit.trace. Default: None. Either this argument or input_sample should be specified. The dict will be unpacking by the arguments name of the traced function. Only valid when accelerator=’jit’ and torch>=2.0, otherwise will be ignored.

  • **kwargs

    Other extra advanced settings include: 1. those be passed to torch.onnx.export function, only valid when accelerator=’onnxruntime’/’openvino’, otherwise will be ignored. Possible arguments are: input_names, output_names, opset_version, et al. For more details, please refer https://pytorch.org/docs/stable/onnx.html#torch.onnx.export. 2. those be passed to model optimizer function of openvino, only valid when accelerator=’openvino’, otherwise will be ignored. Possible arguments are: mean_values, layout, input, output, et al. For more details about model optimizer, you can see mo –help .

Returns

Model with different acceleration.

static get_context(model: torch.nn.modules.module.Module, *models)[source]#

Obtain corresponding context manager from (multi) model, defaults to BaseContextManager().

Parameters
  • model – Any model of torch.nn.Module, including all models accelareted by InferenceOptimizer.trace/InferenceOptimizer.quantize.

  • models – Any model of torch.nn.Module or list of torch.nn.Module, including all models accelareted by InferenceOptimizer.trace/InferenceOptimizer.quantize.

Returns

a context manager if there is no conflict between context managers, otherwise will report RuntimeError.

static save(model: torch.nn.modules.module.Module, path, compression='fp32')[source]#

Save the model to local file.

Parameters
  • model – Any model of torch.nn.Module, including all models accelareted by InferenceOptimizer.trace/InferenceOptimizer.quantize.

  • path – Path to saved model. Path should be a directory.

  • compression – str. This parameter only effective for jit, ipex or pure pytorch model with fp32 or bf16 precision. Defaultly, all models are saved by dtype=fp32 for their parameters. If users set a lower precision, a smaller file sill be saved with some accuracy loss. Users always need to use nano to load the compressed file if compression is set other than “fp32”. Currently, “bf16” and “fp32”(default) are supported.

static load(path, model: Optional[torch.nn.modules.module.Module] = None, input_sample=None, inplace=False, device=None, cache_dir=None, shapes=None)[source]#

Load a model from local.

Parameters
  • path – Path to model to be loaded. Path should be a directory.

  • model – Required FP32 model to load pytorch model, it is needed if: 1. you accelerate the model with accelerator=None by InferenceOptimizer.trace()/InferenceOptimizer.quantize(). 2. you accelerate the model with InferenceOptimizer.optimize() and get_model()/get_best_model(), and the best method or the method you specify don’t contain accelerator ‘onnxruntime’/’openvino’/’jit’. If you are not sure what optimization method is used, we recommend that you always pass in the original model for this case. 3. you want to the loaded model contains the attributes of original model.

  • input_sample – Input sample for your model, could be a Tensor or a tuple. This parameter is needed if: 1. saving model is accelerated by INC IPEX quantization. 2. saving model is accelerated by JIT and you set compression=’bf16’ when saving.

  • inplace – whether to perform inplace optimization. Default: False.

  • device – A string represents the device of the inference. Default to None. Only valid for openvino model, otherwise will be ignored.

  • cache_dir – A directory for OpenVINO to cache the model. Default to None. Only valid for openvino model, otherwise will be ignored.

  • shapes – input shape. For example, ‘input1[1,3,224,224],input2[1,4]’, ‘[1,3,224,224]’. This parameter affect model Parameter shape, can be dynamic. For dynamic dimesions use symbol ?, -1 or range low.. up.’. Default to None, which means you don’t want to reshape the model inputs. Only valid for openvino model, otherwise will be ignored.

Returns

Model with different acceleration(None/OpenVINO/ONNX Runtime/JIT) or precision(FP32/FP16/BF16/INT8).

static to_multi_instance(model: torch.nn.modules.module.Module, num_processes: int = 4, cores_per_process: Optional[int] = None, cpu_for_each_process: Optional[List[List[int]]] = None) bigdl.nano.pytorch.inference.multi_instance._MultiInstanceModel[source]#

Transform a model to multi-instance inference model.

Parameters
  • model – The model to transform.

  • num_processes – The number of processes to use, default to 4.

  • cores_per_process – Number of CPU cores used by each process, default to None, means decided automatically.

  • cpu_for_each_process – Specify the CPU cores used by each process, default to None, if set, it will override num_processes and cores_per_process.

Returns

Model with multi-instance inference acceleration.

get_best_model(accelerator: Optional[str] = None, precision: Optional[str] = None, use_ipex: Optional[bool] = None, accuracy_criterion: Optional[float] = None)#

According to results of optimize, obtain the model with minimum latency under specific restrictions or without restrictions.

Parameters
  • accelerator – (optional) Use accelerator ‘None’, ‘onnxruntime’, ‘openvino’, ‘jit’, defaults to None. If not None, then will only find the model with this specific accelerator.

  • precision – (optional) Supported type: ‘int8’, ‘bf16’, and ‘fp32’. Defaults to None which represents no precision limit. If not None, then will only find the model with this specific precision.

  • use_ipex – (optional) if not None, then will only find the model with this specific ipex setting. This is only effective for pytorch model.

  • accuracy_criterion – (optional) a float represents tolerable accuracy drop percentage, defaults to None meaning no accuracy control.

Returns

best model, corresponding acceleration option

get_model(method_name: str)#

According to results of optimize, obtain the model with method_name.

The available methods are “original”, “fp32_channels_last”, “fp32_ipex”, “fp32_ipex_channels_last”, “bf16”, “bf16_channels_last”, “bf16_ipex”, “bf16_ipex_channels_last”, “static_int8”, “static_int8_ipex”, “jit_fp32”, “jit_fp32_channels_last”, “jit_bf16”, “jit_bf16_channels_last”, “jit_fp32_ipex”, “jit_fp32_ipex_channels_last”, “jit_bf16_ipex”, “jit_bf16_ipex_channels_last”, “jit_int8”, “jit_int8_channels_last”, “openvino_fp32”, “openvino_int8”, “onnxruntime_fp32”, “onnxruntime_int8_qlinear” and “onnxruntime_int8_integer”.

Parameters

method_name – (optional) Obtain specific model according to method_name.

Returns

Model with different acceleration.

summary()#

Print format string representation for optimization result.

TorchNano API#

class bigdl.nano.pytorch.TorchNano(*args: Any, **kwargs: Any)[source]#

TorchNano for BigDL-Nano pytorch.

It can be used to accelerate custom pytorch training loops with very few code changes.

Create a TorchNano with nano acceleration.

Parameters
  • num_processes – number of processes in distributed training, defaults to 1

  • use_ipex – whether use ipex acceleration, defaults to False

  • distributed_backend – use which backend in distributed mode, defaults to 'subprocess', now avaiable backends are 'spawn', 'subprocess' and 'ray'

  • process_group_backend – use which process group backend in distributed mode, defaults to None, means using 'gloo' with CPU, while using 'nccl' with GPU, now avaiable backends are None and 'ccl'.

  • precision – Double precision (64), full precision (32), half precision (16) or bfloat16 precision ('bf16'), defaults to 32. Enable ipex bfloat16 weight prepack when use_ipex=True and precision='bf16'

  • cpu_for_each_process – specify the cpu cores which will be used by each process, if None, cpu cores will be distributed evenly by all processes, only take effect when num_processes > 1

  • channels_last – whether convert input to channels last memory formats, defaults to False.

  • auto_lr – whether to scale the learning rate linearly by num_processes times. Defaults to True. If num_processes=1 or other lr_scheduler is set, auto_lr will be ignored.

setup(model: torch.nn.modules.module.Module, optimizer: Union[torch.optim.optimizer.Optimizer, List[torch.optim.optimizer.Optimizer]], *dataloaders: torch.utils.data.dataloader.DataLoader, move_to_device: bool = True)[source]#

Setup model, optimizers and dataloaders for accelerated training.

Parameters
  • model – A model to setup

  • optimizer – The optimizer(s) to setup

  • *dataloaders

    The dataloader(s) to setup

  • move_to_device – If set True (default), moves the model to the correct device. Set this to False and alternatively use to_device() manually.

Returns

The tuple of the wrapped model, optimizer, loss_func and dataloaders, in the same order they were passed in.

abstract train(*args: Any, **kwargs: Any) Any[source]#

All the code inside this train method gets accelerated by TorchNano.

You can pass arbitrary arguments to this function when overriding it.

bigdl.nano.pytorch.nano(num_processes: Optional[int] = None, use_ipex: bool = False, distributed_backend: str = 'subprocess', precision: Union[str, int] = 32, cpu_for_each_process: Optional[List[List[int]]] = None, channels_last: bool = False, auto_lr: bool = True, *args, **kwargs)[source]#

Run TorchNano.train through a convenient decorator function.

Parameters
  • num_processes – number of processes in distributed training, defaults to 1

  • use_ipex – whether use ipex acceleration, defaults to False

  • distributed_backend – use which backend in distributed mode, defaults to 'subprocess', now avaiable backends are 'subprocess' and 'ray'. bigdl.nano.pytorch.nano decorator does not support 'spawn'.

  • precision – Double precision (64), full precision (32), half precision (16) or bfloat16 precision ('bf16'), defaults to 32. Enable ipex bfloat16 weight prepack when use_ipex=True and precision='bf16'

  • cpu_for_each_process – specify the cpu cores which will be used by each process, if None, cpu cores will be distributed evenly by all processes, only take effect when num_processes > 1

  • channels_last – whether convert input to channels last memory formats, defaults to False.

  • auto_lr – whether to scale the learning rate linearly by num_processes times. Defaults to True. If num_processes=1 or other lr_scheduler is set, auto_lr will be ignored.

Patch API#

bigdl.nano.pytorch.patch_torch(cuda_to_cpu: bool = True)[source]#

patch_torch is used to patch optimized torch classes to replace original ones.

Optimized classes include:

1. pytorch_lightning.Trainer -> bigdl.nano.pytorch.Trainer
2. torchvision.transforms -> bigdl.nano.pytorch.vision.transforms
3. torchvision.datasets -> bigdl.nano.pytorch.vision.datasets
Parameters

cuda_to_cpu – bool, make codes write for CUDA available for CPU if set to True. This feature is still experimental and only valid in python layer codes. Default to True.

bigdl.nano.pytorch.unpatch_torch()[source]#

unpatch_torch is used to unpatch optimized torch classes to original ones.

bigdl.nano.pytorch.patching.patch_cuda(disable_jit: bool = True)[source]#

patch_cuda is used to make users’ application that is written for cuda only runnable on a CPU device by one-line patching.

e.g.
>>> from bigdl.nano.pytorch.patching import patch_cuda
>>> patch_cuda()  # be sure it is used at the header of the application
>>> # all other cuda only codes will be avilable for cpu
Parameters

disable_jit – bool, if to disable jit compile. This is a known issue for patch_cuda function. jit compile has not been supported for some of the patching. Users may change it to False to check if their application is affected by this issue.

bigdl.nano.pytorch.patching.unpatch_cuda()[source]#

unpatch_cuda is an reverse function to patch_cuda. It will change the application back to be available on cuda.

e.g.
>>> from bigdl.nano.pytorch.patching import unpatch_cuda
>>> unpatch_cuda()  # be sure it is used after patch_cuda
>>> # all other codes will be avilable for cuda
Parameters

disable_jit – bool, if to disable jit compile. This is a known issue for patch_cuda function. jit compile has not been supported for some of the patching. Users may change it to False to check if their application is affected by this issue.

bigdl.nano.pytorch.patching.patch_dtype(from_dtype: Union[str, torch.dtype] = 'fp64', to_dtype: Union[str, torch.dtype] = 'fp32')[source]#

patch_dtype is used to change the tensor’s dtype in users’ application from from_dtype to to_dtype.

e.g.
>>> from bigdl.nano.pytorch.patching import patch_dtype
>>> patch_dtype(from_dtype="fp64", to_dtype="fp32")
>>> # will replace all tensors that has fp64 precision to fp32.
Parameters
  • from_dtype – the tensors’ dtype to be replaced. default to “fp64”

  • to_dtype – the tensors’ dtype to use. default to “fp32”

bigdl.nano.pytorch.patching.patch_encryption()[source]#

patch_torch is used to patch torch.save and torch.load methods to replace original ones.

Patched details include:

1. torch.save is now located at bigdl.nano.pytorch.encryption.save
2. torch.load is now located at bigdl.nano.pytorch.encryption.load

A key argument is added to torch.save and torch.load which is used to encrypt/decrypt the content before saving/loading it to/from disk.

Note

Please be noted that the key is only secured in Intel SGX mode.