Tune a Forecasting Task Automatically#
Run in Google Colab View source on GitHub
In this guide we will demonstrate how to use Chronos AutoTSEstimator and Chronos TSPipeline to auto tune a time seires forecasting task and handle the whole model development process easily.
Introduction#
Chronos provides AutoTSEstimator
as a highly integrated solution for time series forecasting task with hyperparameter autotuning, auto feature selection and auto preprocessing. Users can prepare a TSDataset
(recommended, used in this notebook) or their own data creator as input data. By constructing a AutoTSEstimator
and calling fit
on the data, a TSPipeline
contains the best model and pre/post data processing will be returned for further development of deployment.
AutoTSEstimator
only support LSTM, TCN, and Seq2seq built-in models and 3rd party models for now.
Step 0: Prepare Environment#
We recommend using conda to prepare the environment. Please refer to the install guide for more details.
conda create -n my_env python=3.7
conda activate my_env
pip install --pre --upgrade bigdl-chronos[all]
Step 1: Init Orca Context#
if args.cluster_mode == "local":
init_orca_context(cluster_mode="local", cores=4) # run in local mode
elif args.cluster_mode == "k8s":
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2) # run on K8s cluster
elif args.cluster_mode == "yarn":
init_orca_context(cluster_mode="yarn-client", num_nodes=2, cores=2) # run on Hadoop YARN cluster
This is the only place where you need to specify local or distributed mode. View Orca Context for more details.
Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir
when running on Hadoop YARN cluster. View Hadoop User Guide for more details.
Step 2: Prepare a TSDataset#
Prepare a TSDataset
and call necessary operations on it.
from bigdl.chronos.data import TSDataset
from sklearn.preprocessing import StandardScaler
tsdata_train, tsdata_val, tsdata_test\
= TSDataset.from_pandas(df, dt_col="timestamp", target_col="value", with_split=True, val_ratio=0.1, test_ratio=0.1)
standard_scaler = StandardScaler()
for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
tsdata.gen_dt_feature()\
.impute(mode="last")\
.scale(standard_scaler, fit=(tsdata is tsdata_train))
There is no need to call .roll()
or .to_torch_data_loader()
in this step, which is the largest difference between the usage of AutoTSEstimator
and Chronos Forecaster. AutoTSEstimator
will do that automatically and tune the parameters as well.
Please call .gen_dt_feature()
(recommended), .gen_rolling_feature()
, and gen_global_feature()
to generate all candidate features to be selected by AutoTSEstimator
as well as your input extra feature.
Detailed information please refer to TSDataset API doc and Time series data basic concepts.
Step 3: Create an AutoTSEstimator#
import bigdl.orca.automl.hp as hp
from bigdl.chronos.autots import AutoTSEstimator
auto_estimator = AutoTSEstimator(model='lstm', # the model name used for training
search_space='normal', # a default hyper parameter search space
past_seq_len=hp.randint(1, 10), # hp sampling function of past_seq_len for auto-tuning
)
We prebuild three defualt search space for each build-in model, which you can use the by setting search_space
to “minimal”,”normal”, or “large” or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost.
past_seq_len
can be set as a hp sample function, the proper range is highly related to your data. A range between 0.5 cycle and 3 cycle is reasonable.
Detailed information please refer to AutoTSEstimator API doc and basic concepts here.
Step 4: Fit with AutoTSEstimator#
# fit with AutoTSEstimator for a returned TSPipeline
ts_pipeline = auto_estimator.fit(data=tsdata_train, # train dataset
validation_data=tsdata_val, # validation dataset
epochs=5) # number of epochs to train in each trial
Detailed information please refer to AutoTSEstimator API doc.
Step 5: Further deployment with TSPipeline#
The TSPipeline
will reply the same preprcessing and corresponding postprocessing operations on the test data. You may carry out predict, evaluate or save/load for further development.
# predict with the best trial
y_pred = ts_pipeline.predict(tsdata_test)
# evaluate the result pipeline
mse, smape = ts_pipeline.evaluate(tsdata_test, metrics=["mse", "smape"])
print("Evaluate: the mean square error is", mse)
print("Evaluate: the smape value is", smape)
# save the pipeline
my_ppl_file_path = "/tmp/saved_pipeline"
ts_pipeline.save(my_ppl_file_path)
# restore the pipeline for further deployment
from bigdl.chronos.autots import TSPipeline
loaded_ppl = TSPipeline.load(my_ppl_file_path)
Detailed information please refer to TSPipeline API doc.
Optional: Examine the leaderboard visualization#
To view the evaluation result of “not chosen” trails and find some insight or even possibly improve you search space for a new autotuning task. We provide a leaderboard through tensorboard.
# show a tensorboard view
%load_ext tensorboard
%tensorboard --logdir /tmp/autots_estimator/autots_estimator_leaderboard/
Detailed information please refer to Visualization.