AutoTS#

AutoTSEstimator#

Automated TimeSeries Estimator for time series forecasting task. AutoTSEstimator will replace AutoTSTrainer in later version.

class bigdl.chronos.autots.autotsestimator.AutoTSEstimator(model='lstm', search_space={}, metric='mse', metric_mode=None, loss=None, optimizer='Adam', past_seq_len='auto', future_seq_len=1, input_feature_num=None, output_target_num=None, selected_features='auto', backend='torch', logs_dir='/tmp/autots_estimator', cpus_per_trial=1, name='autots_estimator', remote_dir=None)[source]#

Bases: object

Automated TimeSeries Estimator for time series forecasting task, which supports TSDataset and customized data creator as data input on built-in model (only “lstm”, “tcn”, “seq2seq” for now) or 3rd party model.

>>> # Here is a use case example:
>>> # prepare train/valid/test tsdataset
>>> autoest = AutoTSEstimator(model="lstm",
>>>                           search_space=search_space,
>>>                           past_seq_len=6,
>>>                           future_seq_len=1)
>>> tsppl = autoest.fit(data=tsdata_train,
>>>                     validation_data=tsdata_valid)
>>> tsppl.predict(tsdata_test)
>>> tsppl.save("my_tsppl")

AutoTSEstimator trains a model for time series forecasting. Users can choose one of the built-in models, or pass in a customized pytorch or keras model for tuning using AutoML.

Parameters
  • model – a string or a model creation function. A string indicates a built-in model, currently “lstm”, “tcn”, “seq2seq” are supported. A model creation function indicates a 3rd party model, the function should take a config param and return a torch.nn.Module (backend=”torch”) / tf model (backend=”keras”). If you use chronos.data.TSDataset as data input, the 3rd party should have 3 dim input (num_sample, past_seq_len, input_feature_num) and 3 dim output (num_sample, future_seq_len, output_feature_num) and use the same key in the model creation function. If you use a customized data creator, the output of data creator should fit the input of model creation function.

  • search_space – str or dict. hyper parameter configurations. For str, you can choose from “minimal”, “normal”, or “large”, each represents a default search_space for our built-in model with different computing requirement. For dict, Read the API docs for each auto model. Some common hyper parameter can be explicitly set in named parameter. search_space should contain those parameters other than the keyword arguments in this constructor in its key. If a 3rd parth model is used, then you must set search_space to a dict.

  • metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.

  • metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.

  • loss – String or pytorch loss instance or pytorch loss creator function. The default loss function for pytorch backend is nn.MSELoss(). If users use backend=”keras” and 3rd parth model this parameter will be ignored.

  • optimizer – String or pyTorch optimizer creator function or tf.keras optimizer instance. If users use backend=”keras” and 3rd parth model, this parameter will be ignored.

  • past_seq_len – Int or or hp sampling function. The number of historical steps (i.e. lookback) used for forecasting. For hp sampling, see bigdl.orca.automl.hp for more details. The values defaults to ‘auto’, which will automatically infer the cycle length of each time series and take the mode of them. The search space will be automatically set to hp.randint(0.5*cycle_length, 2*cycle_length).

  • future_seq_len – Int or List. The number of future steps to forecast. The value defaults to 1, if future_seq_len is a list, we will sample discretely according to the input list. 1 means the timestamp just after the observed data.

  • input_feature_num – Int. The number of features in the input. The value is ignored if you use chronos.data.TSDataset as input data type.

  • output_target_num – Int. The number of targets in the output. The value is ignored if you use chronos.data.TSDataset as input data type.

  • selected_features – String. “all” and “auto” are supported for now. For “all”, all features that are generated are used for each trial. For “auto”, a subset is sampled randomly from all features for each trial. The parameter is ignored if not using chronos.data.TSDataset as input data type. The value defaults to “auto”.

  • backend – The backend of the auto model. We only support backend as “torch” or “keras” for now.

  • logs_dir – Local directory to save logs and results. It defaults to “/tmp/autots_estimator”

  • cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.

  • name – name of the autots estimator. It defaults to “autots_estimator”.

  • remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.

fit(data, epochs=1, batch_size=32, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]#

fit using AutoEstimator

Parameters
  • data

    train data. For backend of “torch”, data can be a TSDataset or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader.

    For backend of “keras”, data can be a TSDataset or a function that takes a config dictionary as parameter and returns a Tensorflow Dataset.

    Please notice that you should stick to the same data type when you predict/evaluate/fit on the TSPipeline you get from AutoTSEstimator.fit.

  • epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.

  • batch_size – Int or hp sampling function from an integer space. Training batch size. It defaults to 32.

  • validation_data – Validation data. Validation data type should be the same as data.

  • metric_threshold – a trial will be terminated when metric threshold is met.

  • n_sampling – Number of trials to evaluate in total. Defaults to 1. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.

  • search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)

  • search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode

  • scheduler – str, all supported scheduler provided by ray tune

  • scheduler_params – parameters for scheduler

Returns

a TSPipeline with the best model.

get_best_config()[source]#

Get the best configuration

Returns

A dictionary of best hyper parameters

TSPipeline#

TSPipeline is an E2E solution for time series forecasting task. AutoTSEstimator will replace original TSPipeline returned by AutoTSTrainer in later version.

class bigdl.chronos.autots.tspipeline.TSPipeline(model, loss, optimizer, model_creator, loss_creator, optimizer_creator, best_config, **kwargs)[source]#

Bases: object

TSPipeline is an E2E solution for time series analysis (only forecasting task for now). You can use TSPipeline to:

  1. Further development on the prototype. (predict, evaluate, incremental fit)

  2. Deploy the model to their scenario. (save, load)

evaluate(data, metrics=['mse'], multioutput='uniform_average', batch_size=32, quantize=False)[source]#

Evaluate the time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator. The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.

  • multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘uniform_average’.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

  • quantize – if use the quantized model to predict.

evaluate_with_onnx(data, metrics=['mse'], multioutput='uniform_average', batch_size=32, quantize=False)[source]#

Evaluate the time series pipeline with onnx.

Parameters
  • data – data can be a TSDataset or data creator. The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.

  • multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘uniform_average’.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

  • quantize – if use the quantized model to predict.

predict(data, batch_size=32, quantize=False)[source]#

Rolling predict with time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator. The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

  • quantize – if use the quantized model to predict.

predict_with_onnx(data, batch_size=32, quantize=False)[source]#

Rolling predict with onnx with time series pipeline.

Parameters
  • data – data can be a TSDataset or data creator. The TSDataset should follow the same operations as the training TSDataset used in AutoTSEstimator.fit.

  • batch_size – predict batch_size, the process will cost more time if batch_size is small while cost less memory. The param is only effective when data is a TSDataset. The values defaults to 32.

  • quantize – if use the quantized model to predict.

fit(data, validation_data=None, epochs=1, batch_size=None, **kwargs)[source]#

Incremental fitting

Parameters
  • data

    The data support following formats:

    1. data creator:
    a function that takes a config dictionary as parameter and
    returns a PyTorch DataLoader.

    2. a bigdl.chronos.data.TSDataset:
    the TSDataset should follow the same operations as the training
    TSDataset used in AutoTSEstimator.fit.

  • validation_data – validation data, same format as data.

  • epochs – incremental fitting epoch. The value defaults to 1.

  • metric – evaluate metric.

  • batch_size – batch size, defaults to None, which takes the searched best batch_size.

  • **kwargs

    args to be passed to bigdl-nano trainer.

save(file_path)[source]#

Save the TSPipeline to a folder

Parameters

file_path – the folder location to save the pipeline

static load(file_path)[source]#

Load the TSPipeline to a folder

Parameters

file_path – the folder location to load the pipeline

quantize(calib_data, metric=None, conf=None, framework='pytorch_fx', approach='static', tuning_strategy='bayesian', relative_drop=None, absolute_drop=None, timeout=0, max_trials=1)[source]#

Quantization TSPipeline.

Parameters
  • calib_data

    Required for static quantization or evaluation.

    1. data creator:
    a function that takes a config dictionary as parameter and
    returns a PyTorch DataLoader.

    2. a bigdl.chronos.data.TSDataset:
    the TSDataset should follow the same operations as the training
    TSDataset used in AutoTSEstimator.fit.

    3. A torch.utils.data.dataloader.DataLoader object for calibration,
    Users should set the configs correctly (e.g. past_seq_len, …).
    They can be found in TSPipeline._best_config.

    4. A numpy ndarray tuple (x, y).
    x’s shape is (num_samples, past_seq_len, input_feature_dim).
    y’s shape is (num_samples, future_seq_len, output_feature_dim).
    They can be found in TSPipeline._best_config.

  • metric – A str represent the metrics for tunning the quality of quantization. You may choose from “mse”, “mae”, “rmse”, “r2”, “mape”, “smape”.

  • conf – A path to conf yaml file for quantization. Default to None, using default config.

  • framework – string or list, [{‘pytorch’|’pytorch_fx’|’pytorch_ipex’}, {‘onnxrt_integerops’|’onnxrt_qlinearops’}]. Default: ‘pytorch_fx’. Consistent with Intel Neural Compressor.

  • approach – str, ‘static’ or ‘dynamic’. Default to ‘static’.

  • tuning_strategy – str, ‘bayesian’, ‘basic’, ‘mse’ or ‘sigopt’. Default to ‘bayesian’.

  • relative_drop – Float, tolerable ralative accuracy drop. Default to None, e.g. set to 0.1 means that we accept a 10% increase in the metrics error.

  • absolute_drop – Float, tolerable ralative accuracy drop. Default to None, e.g. set to 5 means that we can only accept metrics smaller than 5.

  • timeout – Tuning timeout (seconds). Default to 0, which means early stop. Combine with max_trials field to decide when to exit.

  • max_trials – Max tune times. Default to 1. Combine with timeout field to decide when to exit. “timeout=0, max_trials=1” means it will try quantization only once and return satisfying best model.