Auto Models#
AutoTCN#
AutoTCN is a TCN forecasting model with Auto tuning.
- class bigdl.chronos.autots.model.auto_tcn.AutoTCN(input_feature_num, output_target_num, past_seq_len, future_seq_len, optimizer, loss, metric, metric_mode=None, hidden_units=None, levels=None, num_channels=None, kernel_size=7, lr=0.001, dropout=0.2, backend='torch', logs_dir='/tmp/auto_tcn', cpus_per_trial=1, name='auto_tcn', remote_dir=None)[source]#
Bases:
bigdl.chronos.autots.model.base_automodel.BaseAutomodel
Create an AutoTCN.
- Parameters
input_feature_num – Int. The number of features in the input
output_target_num – Int. The number of targets in the output
past_seq_len – Int. The number of historical steps used for forecasting.
future_seq_len – Int. The number of future steps to forecast.
optimizer – String or pyTorch optimizer creator function or tf.keras optimizer instance.
loss – String or pytorch/tf.keras loss instance or pytorch loss creator function.
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.
hidden_units – Int or hp sampling function from an integer space. The number of hidden units or filters for each convolutional layer. It is similar to units for LSTM. It defaults to 30. We will omit the hidden_units value if num_channels is specified. For hp sampling, see bigdl.orca.automl.hp for more details. e.g. hp.grid_search([32, 64]).
levels – Int or hp sampling function from an integer space. The number of levels of TemporalBlocks to use. It defaults to 8. We will omit the levels value if num_channels is specified.
num_channels – List of integers. A list of hidden_units for each level. You could specify num_channels if you want different hidden_units for different levels. By default, num_channels equals to [hidden_units] * (levels - 1) + [output_target_num].
kernel_size – Int or hp sampling function from an integer space. The size of the kernel to use in each convolutional layer.
lr – float or hp sampling function from a float space. Learning rate. e.g. hp.choice([0.001, 0.003, 0.01])
dropout – float or hp sampling function from a float space. Learning rate. Dropout rate. e.g. hp.uniform(0.1, 0.3)
backend – The backend of the TCN model. support “keras” and “torch”.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_tcn”
cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.
name – name of the AutoTCN. It defaults to “auto_tcn”
remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.
- build_onnx(thread_num=1, sess_options=None)#
Build onnx model to speed up inference and reduce latency. The method is Not required to call before predict_with_onnx, evaluate_with_onnx or export_onnx_file. It is recommended to use when you want to:
1. Strictly control the thread to be used during inferencing.2. Alleviate the cold start problem when you call predict_with_onnx for the first time.- Parameters
thread_num – int, the num of thread limit. The value is set to 1 by default where no limit is set. Besides, the environment variable OMP_NUM_THREADS is suggested to be same as thread_num.
sess_options – an onnxruntime.SessionOptions instance, if you set this other than None, a new onnxruntime session will be built on this setting and ignore other settings you assigned(e.g. thread_num…).
Example
>>> # to pre build onnx sess >>> automodel.build_onnx(thread_num=2) # build onnx runtime sess for two threads >>> pred = automodel.predict_with_onnx(data) >>> # ------------------------------------------------------ >>> # directly call onnx related method is also supported >>> # default to build onnx runtime sess for single thread >>> pred = automodel.predict_with_onnx(data)
- evaluate(data, batch_size=32, metrics=['mse'], multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- evaluate_with_onnx(data, batch_size=32, metrics=['mse'], dirname=None, multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .evaluate() but with higher throughput and lower latency. keras will support onnx later.
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict_with_onnx(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- export_onnx_file(dirname)#
Save the onnx model file to the disk.
- Parameters
dirname – The dir location you want to save the onnx file.
- fit(data, epochs=1, batch_size=32, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)#
Automatically fit the model and search for the best hyper parameters.
- Parameters
data – train data. data can be a tuple of ndarrays or a PyTorch DataLoader or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
batch_size – Int or hp sampling function from an integer space. Training batch size. It defaults to 32.
validation_data – Validation data. Validation data type should be the same as data.
metric_threshold – a trial will be terminated when metric threshold is met.
n_sampling – Number of trials to evaluate in total. Defaults to 1. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”).
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode.
scheduler – str, all supported scheduler provided by ray tune.
scheduler_params – parameters for scheduler.
- get_best_config()#
Get the best configuration
- Returns
A dictionary of best hyper parameters
- get_best_model()#
Get the best pytorch model.
- load(checkpoint_path)#
restore the best model.
- Parameters
checkpoint_path – The checkpoint location you want to load the best model.
- predict(data, batch_size=32)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- predict_with_onnx(data, batch_size=32, dirname=None)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .predict() but with higher throughput and lower latency. keras will support onnx later.
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- save(checkpoint_path)#
Save the best model.
Please note that if you only want the pytorch model or onnx model file, you can call .get_model() or .export_onnx_file(). The checkpoint file generated by .save() method can only be used by .load() in automodel. If you specify “keras” as backend, file name will be best_keras_config.json and best_keras_model.ckpt.
- Parameters
checkpoint_path – The location you want to save the best model.
AutoLSTM#
AutoLSTM is an LSTM forecasting model with Auto tuning.
- class bigdl.chronos.autots.model.auto_lstm.AutoLSTM(input_feature_num, output_target_num, past_seq_len, optimizer, loss, metric, metric_mode=None, hidden_dim=32, layer_num=1, lr=0.001, dropout=0.2, backend='torch', logs_dir='/tmp/auto_lstm', cpus_per_trial=1, name='auto_lstm', remote_dir=None)[source]#
Bases:
bigdl.chronos.autots.model.base_automodel.BaseAutomodel
Create an AutoLSTM.
- Parameters
input_feature_num – Int. The number of features in the input
output_target_num – Int. The number of targets in the output
past_seq_len – Int or hp sampling function The number of historical steps used for forecasting.
optimizer – String or pyTorch optimizer creator function or tf.keras optimizer instance.
loss – String or pytorch/tf.keras loss instance or pytorch loss creator function.
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.
hidden_dim – Int or hp sampling function from an integer space. The number of features in the hidden state h. For hp sampling, see bigdl.chronos.orca.automl.hp for more details. e.g. hp.grid_search([32, 64]).
layer_num – Int or hp sampling function from an integer space. Number of recurrent layers. e.g. hp.randint(1, 3)
lr – float or hp sampling function from a float space. Learning rate. e.g. hp.choice([0.001, 0.003, 0.01])
dropout – float or hp sampling function from a float space. Learning rate. Dropout rate. e.g. hp.uniform(0.1, 0.3)
backend – The backend of the lstm model. support “keras” and “torch”.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_lstm”
cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.
name – name of the AutoLSTM. It defaults to “auto_lstm”
remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.
- build_onnx(thread_num=1, sess_options=None)#
Build onnx model to speed up inference and reduce latency. The method is Not required to call before predict_with_onnx, evaluate_with_onnx or export_onnx_file. It is recommended to use when you want to:
1. Strictly control the thread to be used during inferencing.2. Alleviate the cold start problem when you call predict_with_onnx for the first time.- Parameters
thread_num – int, the num of thread limit. The value is set to 1 by default where no limit is set. Besides, the environment variable OMP_NUM_THREADS is suggested to be same as thread_num.
sess_options – an onnxruntime.SessionOptions instance, if you set this other than None, a new onnxruntime session will be built on this setting and ignore other settings you assigned(e.g. thread_num…).
Example
>>> # to pre build onnx sess >>> automodel.build_onnx(thread_num=2) # build onnx runtime sess for two threads >>> pred = automodel.predict_with_onnx(data) >>> # ------------------------------------------------------ >>> # directly call onnx related method is also supported >>> # default to build onnx runtime sess for single thread >>> pred = automodel.predict_with_onnx(data)
- evaluate(data, batch_size=32, metrics=['mse'], multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- evaluate_with_onnx(data, batch_size=32, metrics=['mse'], dirname=None, multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .evaluate() but with higher throughput and lower latency. keras will support onnx later.
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict_with_onnx(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- export_onnx_file(dirname)#
Save the onnx model file to the disk.
- Parameters
dirname – The dir location you want to save the onnx file.
- fit(data, epochs=1, batch_size=32, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)#
Automatically fit the model and search for the best hyper parameters.
- Parameters
data – train data. data can be a tuple of ndarrays or a PyTorch DataLoader or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
batch_size – Int or hp sampling function from an integer space. Training batch size. It defaults to 32.
validation_data – Validation data. Validation data type should be the same as data.
metric_threshold – a trial will be terminated when metric threshold is met.
n_sampling – Number of trials to evaluate in total. Defaults to 1. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”).
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode.
scheduler – str, all supported scheduler provided by ray tune.
scheduler_params – parameters for scheduler.
- get_best_config()#
Get the best configuration
- Returns
A dictionary of best hyper parameters
- get_best_model()#
Get the best pytorch model.
- load(checkpoint_path)#
restore the best model.
- Parameters
checkpoint_path – The checkpoint location you want to load the best model.
- predict(data, batch_size=32)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- predict_with_onnx(data, batch_size=32, dirname=None)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .predict() but with higher throughput and lower latency. keras will support onnx later.
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- save(checkpoint_path)#
Save the best model.
Please note that if you only want the pytorch model or onnx model file, you can call .get_model() or .export_onnx_file(). The checkpoint file generated by .save() method can only be used by .load() in automodel. If you specify “keras” as backend, file name will be best_keras_config.json and best_keras_model.ckpt.
- Parameters
checkpoint_path – The location you want to save the best model.
AutoSeq2Seq#
AutoSeq2Seq is an Seq2Seq forecasting model with Auto tuning.
- class bigdl.chronos.autots.model.auto_seq2seq.AutoSeq2Seq(input_feature_num, output_target_num, past_seq_len, future_seq_len, optimizer, loss, metric, metric_mode=None, lr=0.001, lstm_hidden_dim=128, lstm_layer_num=2, dropout=0.25, teacher_forcing=False, backend='torch', logs_dir='/tmp/auto_seq2seq', cpus_per_trial=1, name='auto_seq2seq', remote_dir=None)[source]#
Bases:
bigdl.chronos.autots.model.base_automodel.BaseAutomodel
Create an AutoSeq2Seq.
- Parameters
input_feature_num – Int. The number of features in the input
output_target_num – Int. The number of targets in the output
past_seq_len – Int. The number of historical steps used for forecasting.
future_seq_len – Int. The number of future steps to forecast.
optimizer – String or pyTorch optimizer creator function or tf.keras optimizer instance.
loss – String or pytorch/tf.keras loss instance or pytorch loss creator function.
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.
lr – float or hp sampling function from a float space. Learning rate. e.g. hp.choice([0.001, 0.003, 0.01])
lstm_hidden_dim – LSTM hidden channel for decoder and encoder. hp.grid_search([32, 64, 128])
lstm_layer_num – LSTM layer number for decoder and encoder. e.g. hp.randint(1, 4)
dropout – float or hp sampling function from a float space. Learning rate. Dropout rate. e.g. hp.uniform(0.1, 0.3)
teacher_forcing – If use teacher forcing in training. e.g. hp.choice([True, False])
backend – The backend of the Seq2Seq model. support “keras” and “torch”.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_seq2seq”
cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.
name – name of the AutoSeq2Seq. It defaults to “auto_seq2seq”
remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.
- build_onnx(thread_num=1, sess_options=None)#
Build onnx model to speed up inference and reduce latency. The method is Not required to call before predict_with_onnx, evaluate_with_onnx or export_onnx_file. It is recommended to use when you want to:
1. Strictly control the thread to be used during inferencing.2. Alleviate the cold start problem when you call predict_with_onnx for the first time.- Parameters
thread_num – int, the num of thread limit. The value is set to 1 by default where no limit is set. Besides, the environment variable OMP_NUM_THREADS is suggested to be same as thread_num.
sess_options – an onnxruntime.SessionOptions instance, if you set this other than None, a new onnxruntime session will be built on this setting and ignore other settings you assigned(e.g. thread_num…).
Example
>>> # to pre build onnx sess >>> automodel.build_onnx(thread_num=2) # build onnx runtime sess for two threads >>> pred = automodel.predict_with_onnx(data) >>> # ------------------------------------------------------ >>> # directly call onnx related method is also supported >>> # default to build onnx runtime sess for single thread >>> pred = automodel.predict_with_onnx(data)
- evaluate(data, batch_size=32, metrics=['mse'], multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- evaluate_with_onnx(data, batch_size=32, metrics=['mse'], dirname=None, multioutput='raw_values')#
Evaluate using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .evaluate() but with higher throughput and lower latency. keras will support onnx later.
Please note that evaluate result is calculated by scaled y and yhat. If you scaled your data (e.g. use .scale() on the TSDataset) please follow the following code snap to evaluate your result if you need to evaluate on unscaled data.
>>> from bigdl.orca.automl.metrics import Evaluator >>> y_hat = automodel.predict_with_onnx(x) >>> y_hat_unscaled = tsdata.unscale_numpy(y_hat) # or other customized unscale methods >>> y_unscaled = tsdata.unscale_numpy(y) # or other customized unscale methods >>> Evaluator.evaluate(metric=..., y_unscaled, y_hat_unscaled, multioutput=...)
- Parameters
data – a numpy ndarray tuple (x, y) x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num. y’s shape is (num_samples, horizon, target_dim), where horizon and target_dim should be the same as future_seq_len and output_target_num.
batch_size – evaluate batch size. The value will not affect evaluate result but will affect resources cost(e.g. memory and time).
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
multioutput – Defines aggregating of multiple output values. String in [‘raw_values’, ‘uniform_average’]. The value defaults to ‘raw_values’.
- Returns
A list of evaluation results. Each item represents a metric.
- export_onnx_file(dirname)#
Save the onnx model file to the disk.
- Parameters
dirname – The dir location you want to save the onnx file.
- fit(data, epochs=1, batch_size=32, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)#
Automatically fit the model and search for the best hyper parameters.
- Parameters
data – train data. data can be a tuple of ndarrays or a PyTorch DataLoader or a function that takes a config dictionary as parameter and returns a PyTorch DataLoader.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
batch_size – Int or hp sampling function from an integer space. Training batch size. It defaults to 32.
validation_data – Validation data. Validation data type should be the same as data.
metric_threshold – a trial will be terminated when metric threshold is met.
n_sampling – Number of trials to evaluate in total. Defaults to 1. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”).
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode.
scheduler – str, all supported scheduler provided by ray tune.
scheduler_params – parameters for scheduler.
- get_best_config()#
Get the best configuration
- Returns
A dictionary of best hyper parameters
- get_best_model()#
Get the best pytorch model.
- load(checkpoint_path)#
restore the best model.
- Parameters
checkpoint_path – The checkpoint location you want to load the best model.
- predict(data, batch_size=32)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- predict_with_onnx(data, batch_size=32, dirname=None)#
Predict using a the trained model after HPO(Hyper Parameter Optimization).
Be sure to install onnx and onnxruntime to enable this function. The method will give exactly the same result as .predict() but with higher throughput and lower latency. keras will support onnx later.
- Parameters
data – a numpy ndarray x, where x’s shape is (num_samples, lookback, feature_dim) where lookback and feature_dim should be the same as past_seq_len and input_feature_num.
batch_size – predict batch size. The value will not affect predict result but will affect resources cost(e.g. memory and time). The value defaults to 32.
dirname – The directory to save onnx model file. This value defaults to None for no saving file.
- Returns
A numpy array with shape (num_samples, horizon, target_dim).
- save(checkpoint_path)#
Save the best model.
Please note that if you only want the pytorch model or onnx model file, you can call .get_model() or .export_onnx_file(). The checkpoint file generated by .save() method can only be used by .load() in automodel. If you specify “keras” as backend, file name will be best_keras_config.json and best_keras_model.ckpt.
- Parameters
checkpoint_path – The location you want to save the best model.
AutoARIMA#
AutoARIMA is an ARIMA forecasting model with Auto tuning.
- class bigdl.chronos.autots.model.auto_arima.AutoARIMA(p=2, q=2, seasonal=True, P=1, Q=1, m=7, metric='mse', metric_mode=None, logs_dir='/tmp/auto_arima_logs', cpus_per_trial=1, name='auto_arima', remote_dir=None, load_dir=None, **arima_config)[source]#
Bases:
object
Create an automated ARIMA Model. User need to specify either the exact value or the search space of the ARIMA model hyperparameters. For details of the ARIMA model hyperparameters, refer to https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.ARIMA.html#pmdarima.arima.ARIMA.
- Parameters
p – Int or hp sampling function from an integer space for hyperparameter p of the ARIMA model. For hp sampling, see bigdl.chronos.orca.automl.hp for more details. e.g. hp.randint(0, 3).
q – Int or hp sampling function from an integer space for hyperparameter q of the ARIMA model. e.g. hp.randint(0, 3).
seasonal – Bool or hp sampling function from an integer space for whether to add seasonal components to the ARIMA model. e.g. hp.choice([True, False]).
P – Int or hp sampling function from an integer space for hyperparameter P of the ARIMA model. For hp sampling, see bigdl.chronos.orca.automl.hp for more details. e.g. hp.randint(0, 3).
Q – Int or hp sampling function from an integer space for hyperparameter Q of the ARIMA model. e.g. hp.randint(0, 3).
m – Int or hp sampling function from an integer space for hyperparameter p of the ARIMA model. e.g. hp.choice([4, 7, 12, 24, 365]).
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_arima_logs”
cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.
name – name of the AutoARIMA. It defaults to “auto_arima”
remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.
arima_config – Other ARIMA hyperparameters.
- fit(data, epochs=1, validation_data=None, metric_threshold=None, n_sampling=1, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]#
Automatically fit the model and search for the best hyperparameters.
- Parameters
data – Training data, A 1-D numpy array.
epochs – Max number of epochs to train in each trial. Defaults to 1. If you have also set metric_threshold, a trial will stop if either it has been optimized to the metric_threshold or it has been trained for {epochs} epochs.
validation_data – Validation data. A 1-D numpy array.
metric_threshold – a trial will be terminated when metric threshold is met
n_sampling – Number of trials to evaluate in total. Defaults to 1. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode
scheduler – str, all supported scheduler provided by ray tune
scheduler_params – parameters for scheduler
AutoProphet#
AutoProphet is a Prophet forecasting model with Auto tuning.
- class bigdl.chronos.autots.model.auto_prophet.AutoProphet(changepoint_prior_scale=None, seasonality_prior_scale=None, holidays_prior_scale=None, seasonality_mode=None, changepoint_range=None, metric='mse', metric_mode=None, logs_dir='/tmp/auto_prophet_logs', cpus_per_trial=1, name='auto_prophet', remote_dir=None, load_dir=None, **prophet_config)[source]#
Bases:
object
Create an automated Prophet Model. User need to specify either the exact value or the search space of the Prophet model hyperparameters. For details of the Prophet model hyperparameters, refer to https://facebook.github.io/prophet/docs/diagnostics.html#hyperparameter-tuning.
- Parameters
changepoint_prior_scale – Int or hp sampling function from an integer space for hyperparameter changepoint_prior_scale for the Prophet model. For hp sampling, see bigdl.chronos.orca.automl.hp for more details. e.g. hp.loguniform(0.001, 0.5).
seasonality_prior_scale – hyperparameter seasonality_prior_scale for the Prophet model. e.g. hp.loguniform(0.01, 10).
holidays_prior_scale – hyperparameter holidays_prior_scale for the Prophet model. e.g. hp.loguniform(0.01, 10).
seasonality_mode – hyperparameter seasonality_mode for the Prophet model. e.g. hp.choice([‘additive’, ‘multiplicative’]).
changepoint_range – hyperparameter changepoint_range for the Prophet model. e.g. hp.uniform(0.8, 0.95).
metric – String or customized evaluation metric function. If string, metric is the evaluation metric name to optimize, e.g. “mse”. If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
metric_mode – One of [“min”, “max”]. “max” means greater metric value is better. You have to specify metric_mode if you use a customized metric function. You don’t have to specify metric_mode if you use the built-in metric in bigdl.orca.automl.metrics.Evaluator.
logs_dir – Local directory to save logs and results. It defaults to “/tmp/auto_prophet_logs”
cpus_per_trial – Int. Number of cpus for each trial. It defaults to 1.
name – name of the AutoProphet. It defaults to “auto_prophet”
remote_dir – String. Remote directory to sync training results and checkpoints. It defaults to None and doesn’t take effects while running in local. While running in cluster, it defaults to “hdfs:///tmp/{name}”.
load_dir – Load the ckpt from load_dir. The value defaults to None.
prophet_config – Other Prophet hyperparameters.
- fit(data, cross_validation=True, expect_horizon=None, freq=None, metric_threshold=None, n_sampling=16, search_alg=None, search_alg_params=None, scheduler=None, scheduler_params=None)[source]#
Automatically fit the model and search for the best hyperparameters.
- Parameters
data – training data, a pandas dataframe with Td rows, and 2 columns, with column ‘ds’ indicating date and column ‘y’ indicating value and Td is the time dimension
cross_validation – bool, if the eval result comes from cross_validation. The value is set to True by default. Setting this option to False to speed up the process.
expect_horizon – int, validation data will be automatically splited from training data, and expect_horizon is the horizon you may need to use once the mode is fitted. The value defaults to None, where 10% of training data will be taken as the validation data.
freq – the freqency of the training dataframe. the frequency can be anything from the pandas list of frequency strings here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliasesDefaulted to None, where an unreliable frequency will be infer implicitly.
metric_threshold – a trial will be terminated when metric threshold is met
n_sampling – Number of trials to evaluate in total. Defaults to 16. If hp.grid_search is in search_space, the grid will be run n_sampling of trials and round up n_sampling according to hp.grid_search. If this is -1, (virtually) infinite samples are generated until a stopping condition is met.
search_alg – str, all supported searcher provided by ray tune (i.e.”variant_generator”, “random”, “ax”, “dragonfly”, “skopt”, “hyperopt”, “bayesopt”, “bohb”, “nevergrad”, “optuna”, “zoopt” and “sigopt”)
search_alg_params – extra parameters for searcher algorithm besides search_space, metric and searcher mode
scheduler – str, all supported scheduler provided by ray tune
scheduler_params – parameters for scheduler
- predict(horizon=1, freq='D', ds_data=None)[source]#
Predict using the best model after HPO.
- Parameters
horizon – the number of steps forward to predict
freq – the freqency of the predicted dataframe, defaulted to day(“D”), the frequency can be anything from the pandas list of frequency strings here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
ds_data – a dataframe that has 1 column ‘ds’ indicating date.
- evaluate(data, metrics=['mse'])[source]#
Evaluate using the best model after HPO.
- Parameters
data – evaluation data, a pandas dataframe with Td rows, and 2 columns, with column ‘ds’ indicating date and column ‘y’ indicating value and Td is the time dimension
metrics – list of string or callable. e.g. [‘mse’] or [customized_metrics] If callable function, it signature should be func(y_true, y_pred), where y_true and y_pred are numpy ndarray. The function should return a float value as evaluation result.
- save(checkpoint_file)[source]#
Save the best model after HPO.
- Parameters
checkpoint_file – The location you want to save the best model, should be a json file