Distributed Processing#
Distributed training#
LSTM, TCN and Seq2seq users can easily train their forecasters in a distributed fashion to handle extra large dataset and utilize a cluster. The functionality is powered by Project Orca.
f = Forecaster(..., distributed=True)
f.fit(...)
f.predict(...)
f.to_local() # collect the forecaster to single node
f.predict_with_onnx(...) # onnxruntime only supports single node
Distributed Data processing: XShardsTSDataset#
Warning
XShardsTSDataset
is still experimental.
TSDataset
is a single thread lib with reasonable speed on large datasets(~10G). When you handle an extra large dataset or limited memory on a single node, XShardsTSDataset
can be involved to handle the exact same functionality and usage as TSDataset
in a distributed fashion.
# a fully distributed forecaster pipeline
from orca.data.pandas import read_csv
from bigdl.chronos.data.experimental import XShardsTSDataset
shards = read_csv("hdfs://...")
tsdata, _, test_tsdata = XShardsTSDataset.from_xshards(...)
tsdata_xshards = tsdata.roll(...).to_xshards()
test_tsdata_xshards = test_tsdata.roll(...).to_xshards()
f = Forecaster(..., distributed=True)
f.fit(tsdata_xshards, ...)
f.predict(test_tsdata_xshards, ...)