Chronos Quick Tour#
Welcome to Chronos for building a fast, accurate and scalable time series analysis application🎉! Start with our quick tour to understand some critical concepts and how to use them to tackle your tasks.
Data processing
Time series data processing includes imputing, deduplicating, resampling, scale/unscale, roll sampling, etc to process raw time series data(typically in a table) to a format that is understandable to the models. TSDataset
and XShardsTSDataset
are provided for an abstraction.
Forecasting
Time series forecasting uses history data to predict future data. Forecaster
and AutoTSEstimator
are provided for built-in algorithms and distributed hyperparameter tunning.
Anomaly Detection
Time series anomaly detection finds the anomaly point in time series. Detector
is provided for many built-in algorithms.
Simulation
Time series simulation generates synthetic time series data. Simulator
is provided for many built-in algorithms.
TSDataset/XShardsTSDataset#
In Chronos, we provide a TSDataset
(and a XShardsTSDataset
to handle large data input in distributed fashion) abstraction to represent a time series dataset. It is responsible for preprocessing raw time series data(typically in a table) to a format that is understandable to the models. Many typical transformation, preprocessing and feature engineering method can be called cascadely on TSDataset
or XShardsTSDataset
.
# !wget https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv
import pandas as pd
from sklearn.preprocessing import StandardScaler
from bigdl.chronos.data import TSDataset
df = pd.read_csv("nyc_taxi.csv", parse_dates=["timestamp"])
tsdata = TSDataset.from_pandas(df,
dt_col="timestamp",
target_col="value")
scaler = StandardScaler()
tsdata.deduplicate()\
.impute()\
.gen_dt_feature()\
.scale(scaler)\
.roll(lookback=100, horizon=1)
Forecaster#
We have implemented quite a few algorithms among traditional statistics to deep learning for time series forecasting in bigdl.chronos.forecaster
package. Users may train these forecasters on history time series and use them to predict future time series.
To import a specific forecaster, you may use {algorithm name} + “Forecaster”, and call fit
to train the forecaster and predict
to predict future data.
from bigdl.chronos.forecaster import TCNForecaster # TCN is algorithm name
from bigdl.chronos.data import get_public_dataset
if __name__ == "__main__":
# use nyc_taxi public dataset
train_data, _, test_data = get_public_dataset("nyc_taxi")
for data in [train_data, test_data]:
# use 100 data point in history to predict 1 data point in future
data.roll(lookback=100, horizon=1)
# create a forecaster
forecaster = TCNForecaster.from_tsdataset(train_data)
# train the forecaster
forecaster.fit(train_data)
# predict with the trained forecaster
pred = forecaster.predict(test_data)
AutoTSEstimator#
For time series forecasting, we also provide an AutoTSEstimator
for distributed hyperparameter tunning as an extention to Forecaster
. Users only need to create a AutoTSEstimator
and call fit
to train the estimator. A TSPipeline
will be returned for users to predict future data.
from bigdl.orca.automl import hp
from bigdl.chronos.data import get_public_dataset
from bigdl.chronos.autots import AutoTSEstimator
from bigdl.orca import init_orca_context, stop_orca_context
from sklearn.preprocessing import StandardScaler
if __name__ == "__main__":
# initial orca context
init_orca_context(cluster_mode="local", cores=4, memory="8g", init_ray_on_spark=True)
# load dataset
tsdata_train, tsdata_val, tsdata_test = get_public_dataset(name='nyc_taxi')
# dataset preprocessing
stand = StandardScaler()
for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
tsdata.gen_dt_feature().impute()\
.scale(stand, fit=tsdata is tsdata_train)
# AutoTSEstimator initalization
autotsest = AutoTSEstimator(model="tcn",
future_seq_len=10)
# AutoTSEstimator fitting
tsppl = autotsest.fit(data=tsdata_train,
validation_data=tsdata_val)
# Prediction
pred = tsppl.predict(tsdata_test)
# stop orca context
stop_orca_context()
Detector#
We have implemented quite a few algorithms among traditional statistics to deep learning for time series anomaly detection in bigdl.chronos.detector.anomaly
package.
To import a specific detector, you may use {algorithm name} + “Detector”, and call fit
to train the detector and anomaly_indexes
to get anomaly data points’ indexs.
from bigdl.chronos.detector.anomaly import DBScanDetector # DBScan is algorithm name
from bigdl.chronos.data import get_public_dataset
if __name__ == "__main__":
# use nyc_taxi public dataset
train_data = get_public_dataset("nyc_taxi", with_split=False)
# create a detector
detector = DBScanDetector()
# fit a detector
detector.fit(train_data.to_pandas()['value'].to_numpy())
# find the anomaly points
anomaly_indexes = detector.anomaly_indexes()
Simulator(experimental)#
Simulator is still under activate development with unstable API.