Tune a Forecasting Task Automatically#


../../../_images/colab_logo_32px.pngRun in Google Colab  ../../../_images/GitHub-Mark-32px.pngView source on GitHub


In this guide we will demonstrate how to use Chronos AutoTSEstimator and Chronos TSPipeline to auto tune a time seires forecasting task and handle the whole model development process easily.

Introduction#

Chronos provides AutoTSEstimator as a highly integrated solution for time series forecasting task with hyperparameter autotuning, auto feature selection and auto preprocessing. Users can prepare a TSDataset(recommended, used in this notebook) or their own data creator as input data. By constructing a AutoTSEstimator and calling fit on the data, a TSPipeline contains the best model and pre/post data processing will be returned for further development of deployment.

AutoTSEstimator only support LSTM, TCN, and Seq2seq built-in models and 3rd party models for now.

Step 0: Prepare Environment#

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create -n my_env python=3.7
conda activate my_env
pip install --pre --upgrade bigdl-chronos[all]

Step 1: Init Orca Context#

if args.cluster_mode == "local":
    init_orca_context(cluster_mode="local", cores=4) # run in local mode
elif args.cluster_mode == "k8s":
    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2) # run on K8s cluster
elif args.cluster_mode == "yarn":
    init_orca_context(cluster_mode="yarn-client", num_nodes=2, cores=2) # run on Hadoop YARN cluster

This is the only place where you need to specify local or distributed mode. View Orca Context for more details.

Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir when running on Hadoop YARN cluster. View Hadoop User Guide for more details.

Step 2: Prepare a TSDataset#

Prepare a TSDataset and call necessary operations on it.

from bigdl.chronos.data import TSDataset
from sklearn.preprocessing import StandardScaler

tsdata_train, tsdata_val, tsdata_test\
    = TSDataset.from_pandas(df, dt_col="timestamp", target_col="value", with_split=True, val_ratio=0.1, test_ratio=0.1)

standard_scaler = StandardScaler()
for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
    tsdata.gen_dt_feature()\
          .impute(mode="last")\
          .scale(standard_scaler, fit=(tsdata is tsdata_train))

There is no need to call .roll() or .to_torch_data_loader() in this step, which is the largest difference between the usage of AutoTSEstimator and Chronos Forecaster. AutoTSEstimator will do that automatically and tune the parameters as well.

Please call .gen_dt_feature()(recommended), .gen_rolling_feature(), and gen_global_feature() to generate all candidate features to be selected by AutoTSEstimator as well as your input extra feature.

Detailed information please refer to TSDataset API doc and Time series data basic concepts.

Step 3: Create an AutoTSEstimator#

import bigdl.orca.automl.hp as hp
from bigdl.chronos.autots import AutoTSEstimator
auto_estimator = AutoTSEstimator(model='lstm', # the model name used for training
                                 search_space='normal', # a default hyper parameter search space
                                 past_seq_len=hp.randint(1, 10), # hp sampling function of past_seq_len for auto-tuning
) 

We prebuild three defualt search space for each build-in model, which you can use the by setting search_space to “minimal”,”normal”, or “large” or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost.

past_seq_len can be set as a hp sample function, the proper range is highly related to your data. A range between 0.5 cycle and 3 cycle is reasonable.

Detailed information please refer to AutoTSEstimator API doc and basic concepts here.

Step 4: Fit with AutoTSEstimator#

# fit with AutoTSEstimator for a returned TSPipeline
ts_pipeline = auto_estimator.fit(data=tsdata_train, # train dataset
                                 validation_data=tsdata_val, # validation dataset
                                 epochs=5) # number of epochs to train in each trial

Detailed information please refer to AutoTSEstimator API doc.

Step 5: Further deployment with TSPipeline#

The TSPipeline will reply the same preprcessing and corresponding postprocessing operations on the test data. You may carry out predict, evaluate or save/load for further development.

# predict with the best trial
y_pred = ts_pipeline.predict(tsdata_test)
# evaluate the result pipeline
mse, smape = ts_pipeline.evaluate(tsdata_test, metrics=["mse", "smape"])
print("Evaluate: the mean square error is", mse)
print("Evaluate: the smape value is", smape)
# save the pipeline
my_ppl_file_path = "/tmp/saved_pipeline"
ts_pipeline.save(my_ppl_file_path)
# restore the pipeline for further deployment
from bigdl.chronos.autots import TSPipeline
loaded_ppl = TSPipeline.load(my_ppl_file_path)

Detailed information please refer to TSPipeline API doc.

Optional: Examine the leaderboard visualization#

To view the evaluation result of “not chosen” trails and find some insight or even possibly improve you search space for a new autotuning task. We provide a leaderboard through tensorboard.

# show a tensorboard view
%load_ext tensorboard
%tensorboard --logdir /tmp/autots_estimator/autots_estimator_leaderboard/

Detailed information please refer to Visualization.