Chronos Quick Tour¶

Welcome to Chronos for building a fast, accurate and scalable time series analysis application🎉! Start with our quick tour to understand some critical concepts and how to use them to tackle your tasks.

Data processing

Time series data processing includes imputing, deduplicating, resampling, scale/unscale, roll sampling, etc to process raw time series data(typically in a table) to a format that is understandable to the models. TSDataset and XShardsTSDataset are provided for an abstraction.

Forecasting

Time series forecasting uses history data to predict future data. Forecaster and AutoTSEstimator are provided for built-in algorithms and distributed hyperparameter tunning.

Anomaly Detection

Time series anomaly detection finds the anomaly point in time series. Detector is provided for many built-in algorithms.

Simulation

Time series simulation generates synthetic time series data. Simulator is provided for many built-in algorithms.

TSDataset/XShardsTSDataset¶

In Chronos, we provide a TSDataset (and a XShardsTSDataset to handle large data input in distributed fashion) abstraction to represent a time series dataset. It is responsible for preprocessing raw time series data(typically in a table) to a format that is understandable to the models. Many typical transformation, preprocessing and feature engineering method can be called cascadely on TSDataset or XShardsTSDataset.

# !wget https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv
import pandas as pd
from sklearn.preprocessing import StandardScaler
from bigdl.chronos.data import TSDataset

df = pd.read_csv("nyc_taxi.csv", parse_dates=["timestamp"])
tsdata = TSDataset.from_pandas(df,
                            dt_col="timestamp",
                            target_col="value")
scaler = StandardScaler()
tsdata.deduplicate()\
    .impute()\
    .gen_dt_feature()\
    .scale(scaler)\
    .roll(lookback=100, horizon=1)

Forecaster¶

We have implemented quite a few algorithms among traditional statistics to deep learning for time series forecasting in bigdl.chronos.forecaster package. Users may train these forecasters on history time series and use them to predict future time series.

To import a specific forecaster, you may use {algorithm name} + “Forecaster”, and call fit to train the forecaster and predict to predict future data.

from bigdl.chronos.forecaster import TCNForecaster  # TCN is algorithm name
from bigdl.chronos.data.repo_dataset import get_public_dataset

if __name__ == "__main__":
    # use nyc_taxi public dataset
    train_data, _, test_data = get_public_dataset("nyc_taxi")
    for data in [train_data, test_data]:
        # use 100 data point in history to predict 1 data point in future
        data.roll(lookback=100, horizon=1)

    # create a forecaster
    forecaster = TCNForecaster.from_tsdataset(train_data)

    # train the forecaster
    forecaster.fit(train_data)

    # predict with the trained forecaster
    pred = forecaster.predict(test_data)

AutoTSEstimator¶

For time series forecasting, we also provide an AutoTSEstimator for distributed hyperparameter tunning as an extention to Forecaster. Users only need to create a AutoTSEstimator and call fit to train the estimator. A TSPipeline will be returned for users to predict future data.

from bigdl.orca.automl import hp
from bigdl.chronos.data.repo_dataset import get_public_dataset
from bigdl.chronos.autots import AutoTSEstimator
from bigdl.orca import init_orca_context, stop_orca_context
from sklearn.preprocessing import StandardScaler

if __name__ == "__main__":
    # initial orca context
    init_orca_context(cluster_mode="local", cores=4, memory="8g", init_ray_on_spark=True)

    # load dataset
    tsdata_train, tsdata_val, tsdata_test = get_public_dataset(name='nyc_taxi')

    # dataset preprocessing
    stand = StandardScaler()
    for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
        tsdata.gen_dt_feature().impute()\
            .scale(stand, fit=tsdata is tsdata_train)

    # AutoTSEstimator initalization
    autotsest = AutoTSEstimator(model="tcn",
                                future_seq_len=10)

    # AutoTSEstimator fitting
    tsppl = autotsest.fit(data=tsdata_train,
                        validation_data=tsdata_val)

    # Prediction
    pred = tsppl.predict(tsdata_test)

    # stop orca context
    stop_orca_context()

Detector¶

We have implemented quite a few algorithms among traditional statistics to deep learning for time series anomaly detection in bigdl.chronos.detector.anomaly package.

To import a specific detector, you may use {algorithm name} + “Detector”, and call fit to train the detector and anomaly_indexes to get anomaly data points’ indexs.

from bigdl.chronos.detector.anomaly import DBScanDetector  # DBScan is algorithm name
from bigdl.chronos.data.repo_dataset import get_public_dataset

if __name__ == "__main__":
    # use nyc_taxi public dataset
    train_data = get_public_dataset("nyc_taxi", with_split=False)

    # create a detector
    detector = DBScanDetector()

    # fit a detector
    detector.fit(train_data.to_pandas()['value'].to_numpy())

    # find the anomaly points
    anomaly_indexes = detector.anomaly_indexes()

Simulator(experimental)¶

Simulator is still under activate development with unstable API.