Anomaly Detectors#

AEDetector#

AEDetector is unsupervised anomaly detector. It builds an autoencoder network, tries to fit the model to the input data, and calcuates the reconstruction error. The samples with larger reconstruction errors are more likely the anomalies.

class bigdl.chronos.detector.anomaly.ae_detector.AEDetector(roll_len=24, ratio=0.1, compress_rate=0.8, batch_size=100, epochs=200, verbose=0, sub_scalef=1, backend='keras', lr=0.001)[source]#

Bases: bigdl.chronos.detector.anomaly.abstract.AnomalyDetector

Example

>>> #The dataset to detect is y
>>> y = numpy.array(...)
>>> ad = AEDetector(roll_len=24)
>>> ad.fit(y)
>>> anomaly_scores = ad.score()
>>> anomaly_indexes = ad.anomaly_indexes()

Initialize an AEDetector. AEDetector supports two modes to detect anomalies in input time series.

1. direct-mode: It trains an autoencoder network directly on the input times series and calculate anomaly scores based on reconstruction error. For each sample in the input, the larger the reconstruction error, the higher the anomaly score.

2. window mode: It first rolls the input series into a batch of subsequences, each with length = roll_len. Then it trains an autoencoder network on the batch of subsequences and calculate the reconstruction error. The anomaly score for each sample is a linear combinition of two parts: 1) the reconstruction error of the sample in a subsequence. 2) the reconstruction error of the entire subsequence as a vector. You can use sub_scalef to control the weights of the 2nd part. Note that one sample may belong to several subsequences as subsequences overlap because of rolling, and we only keep the largest anomaly score as the final score.

Parameters
  • roll_len – the length of window when rolling the input data. If roll_len=0, direct mode is used. If roll_len >0, window mode is used. When setting roll_len, we suggest use a number that is probably a full or half a cycle in your data. e.g. half a day, one day, etc. Note that roll_len must be smaller than the total length of the input time series.

  • ratio – (estimated) ratio of anomalies

  • compress_rate – the compression rate of the autoencoder, changing this value will have impact on the reconstruction error it calculated.

  • batch_size – batch size for autoencoder training

  • epochs – num of epochs fro autoencoder training

  • verbose – verbose option for autoencoder training

  • sub_scalef – scale factor for the subsequence distance when calculating anomaly score

  • backend – the backend type, can be “keras” or “torch”

  • lr – the learning rate of model’s optimizer

fit(y)[source]#

Fit the model.

Parameters

y – the input time series. y must be 1-D numpy array.

score()[source]#

Gets the anomaly scores for each sample. All anomaly scores are positive numbers. Samples with larger scores are more likely the anomalies. If rolled, the anomaly score is calculated by aggregating the reconstruction errors of each point and subsequence.

Returns

the anomaly scores, in an array format with the same size as input

anomaly_indexes()[source]#

Gets the indexes of N samples with the largest anomaly scores in y (N = size of input y * AEDetector.ratio)

Returns

the indexes of N samples

DBScanDetector#

DBScanDetector uses DBSCAN clustering for anomaly detection. The DBSCAN algorithm tries to cluster the points and label the points that do not belong to any clusters as -1. It thus detects outliers in the input time series.

class bigdl.chronos.detector.anomaly.dbscan_detector.DBScanDetector(eps=0.01, min_samples=6, **argv)[source]#

Bases: bigdl.chronos.detector.anomaly.abstract.AnomalyDetector

Example

>>> #The dataset to detect is y
>>> y = numpy.array(...)
>>> ad = DBScanDetector(eps=0.1, min_samples=6)
>>> ad.fit(y)
>>> anomaly_scores = ad.score()
>>> anomaly_indexes = ad.anomaly_indexes()

Initialize a DBScanDetector.

Parameters
  • eps – The maximum distance between two samples for one to be considered as the neighborhood of the other. It is a parameter of DBSCAN, refer to sklearn.cluster.DBSCAN docs for more details.

  • min_samples – The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. It is a parameter of DBSCAN, refer to sklearn.cluster.DBSCAN docs for more details.

  • argv – Other parameters used in DBSCAN. Refer to sklearn.cluster.DBSCAN docs for more details.

fit(y, use_sklearnex=True)[source]#

Fit the model

Parameters
  • y – the input time series. y must be 1-D numpy array.

  • use_sklearnex – bool, If scikit-learn-intelex is not installed, DBScanDetector will fallback to use stock sklearn.

score()[source]#

Gets the anomaly scores for each sample. Each anomaly score is either 0 or 1, where 1 indicates an anomaly.

Returns

anomaly score for each sample, in an array format with the same size as input

anomaly_indexes()[source]#

Gets the indexes of the anomalies.

Returns

the indexes of the anomalies.

ThresholdDetector#

ThresholdDetector is a simple anomaly detector that detectes anomalies based on threshold. The target value for anomaly testing can be either 1) the sample value itself or 2) the difference between the forecasted value and the actual value, if the forecasted values are provied. The thresold can be set by user or esitmated from the train data accoring to anomaly ratio and statistical distributions.

class bigdl.chronos.detector.anomaly.th_detector.ThresholdDetector[source]#

Bases: bigdl.chronos.detector.anomaly.abstract.AnomalyDetector

Example

>>> #The dataset is split into x_train, x_test, y_train, y_test
>>> forecaster = Forecaster(...)
>>> forecaster.fit(x=x_train, y=y_train, ...)
>>> y_pred = forecaster.predict(x_test)
>>> td = ThresholdDetector()
>>> td.set_params(threshold=10)
>>> td.fit(y_test, y_pred)
>>> anomaly_scores = td.score()
>>> anomaly_indexes = td.anomaly_indexes()

Initialize a ThresholdDetector.

set_params(mode='default', ratio=0.01, threshold=inf, dist_measure=<bigdl.chronos.detector.anomaly.th_detector.EuclideanDistance object>)[source]#

Set parameters for ThresholdDetector

Parameters
  • mode – mode can be “default” or “gaussian”. “default” : fit data according to a uniform distribution “gaussian”: fit data according to a gaussian distribution

  • ratio – the ratio of anomaly to consider as anomaly.

  • threshold

    threshold, could be

    1. a single value - absolute distance threshold, same for all samples

    2. a tuple (min, max) - min and max are either int/float or tensors in same shape as y, yhat is ignored in this case

  • dist_measure – measure of distance

fit(y, y_pred=None)[source]#

Fit the model

Parameters
  • y – the values to detect. shape could be 1-D (num_samples,) or 2-D array (num_samples, features)

  • y_pred – the estimated values, a tensor with same shape as y could be None when threshold is a tuple

score(y=None, y_pred=None)[source]#

Gets the anomaly scores for each sample. Each anomaly score is either 0 or 1, where 1 indicates an anomaly.

Parameters
  • y – new time series to detect anomaly. if y is None, returns anomalies in the fit input, y_pred is ignored in this case

  • y_pred – forecasts corresponding to y

Returns

anomaly score for each sample, in an array format with the same size as input

anomaly_indexes(y=None, y_pred=None)[source]#

Gets the indexes of the anomalies.

Parameters
  • y – new time series to detect anomaly. if y is None, returns anomalies in the fit input, y_pred is ignored in this case

  • y_pred – forecasts corresponding to y

Returns

the indexes of the anomalies.