API Reference

fABBA is the API of the symbolic representation transformation for univariate time series.

fABBA

class fABBA.fABBA(tol=0.1, alpha=0.5, sorting='2-norm', scl=1, verbose=1, max_len=-1, return_list=False, n_jobs=1)[source]

fABBA: A fast sorting-based aggregation method for symbolic time series representation

Parameters

tol - float, default=0.1

Control tolerence for compression.

alpha - float, default=0.5

Control tolerence for digitization.

sorting - str, default=’2-norm’, {‘lexi’, ‘1-norm’, ‘2-norm’}

by which the sorting pieces prior to aggregation.

scl - int, default=1

Scale for length, default as 1, refers to 2d-digitization, otherwise implement 1d-digitization.

verbose - int, default=1

Verbosity mode, control logs print, default as 1; print logs.

max_len - int, default=-1

The max length for each segment, optional choice for compression.

return_list - boolean, default=True

Whether to return with list or not, “False” means return string.

n_jobs - int, default=-1

The number of threads to use for the computation. -1 means no parallel computing.

Attributes

parameters - Model

Contains the learnable parameters from the in-sample data.

Attributes: * centers - numpy.ndarray

the centers calculated for each group formed by aggregation

  • splist - numpy.ndarray

    the starting point for each group formed by aggregation

  • alphabetsap - dict

    store the oen to one key-value pair for labels earmarked for the groups and the corresponding character

string_ - str or list

Contains the ABBA representation.

  • In addition to fit_transform, the compression and digitization functions are independent applicable to data.

compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters.

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression.

alphabet_set - int or list

The list of alphabet letter.

Returns

string - str or list)

String sequence.

parameters - Model

The parameters of model.

fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

start - float

First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.

parameters - Model

The parameters of model.

Returns

series - list

Reconstruction of the time series.

class fABBA.loadData(name='Beef')[source]

Load the example data.

Parameters

namestr
The dataset name, current support ‘AtrialFibrillation’, ‘BasicMotions’, ‘Beef’,

‘CharacterTrajectories’, ‘LSST’, ‘Epilepsy’, ‘NATOPS’, ‘UWaveGestureLibrary’, ‘JapaneseVowels’.

For more datasets, we refer the users to https://www.timeseriesclassification.com/ or https://archive.ics.uci.edu/datasets.

Returns

train, testnumpy.ndarray

Return data for train and test, respectively.

class fABBA.load_images[source]

Load the example images (a total of 24 images for test).

Parameters

No parameter input.

Returns

imageslist

Return list storing image data.

ABBAbase

class fABBA.ABBAbase(clustering, tol=0.1, scl=1, verbose=1, max_len=-1)[source]
compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce tol and len/inc scaling parameter scl.

In this variant, a ‘temporary’ cluster center is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. It is not necessarily the mean of all pieces in that cluster and hence the final cluster centers, which are just the means, might achieve a smaller within-cluster tol.

fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

start - float

First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.

parameters - Model

The parameters of model.

Returns

series - list

Reconstruction of the time series.

JABBA

JABBA is the API of the symbolic representation transformation for univariate time series, multivariate (rep., multiple univariate) time series, which allows for the combination of ABBA method with various clustering techniques.

class fABBA.JABBA(tol=0.2, init='agg', k=2, r=0.5, alpha=None, sorting='norm', scl=1, max_iter=2, partition_rate=None, partition=None, max_len=inf, verbose=1, random_state=2022, fillna='ffill', auto_digitize=False)[source]

Parallel version of ABBA with fast implementation.

Parameters

tol - double, default=0.5

Tolerance for compression.

k - int, default=1

The number of clusters (distinct symbols) specified for ABBA.

r - float, default=0.5

The rate of data sampling to perform k-means.

alpha - double, default=0.5

Tolerance for digitization.

init - str, default=’agg’

The clustering algorithm in digitization. optional: ‘f-kmeans’, ‘kmeans’.

sorting - str, default=”norm”.

Apply sorting data before aggregation (inside digitization). Alternative option: “pca”.

max_len - int

The max length of series contained in each compression pieces.

max_iter - int, default=2

The max iteration for fast k-means algorithm.

batch_size - int, default=1024

Size of the mini batches for mini-batch kmeans in digitization. For faster compuations, you can set the batch_size greater than 256 * number of cores to enable parallelism on all cores.

verbose - int or boolean, default=1

Enable verbose output.

partition_rate - float or int, default=None

This parameter is to get the number of partitions of time series. when this parameter is not None, the partitions will be n_jobs*int(np.round(np.exp(1/self.partition_rate), 0))

partition - int:

The number of subsequences for time series to be partitioned.

scl - int or float, default=1

Scale the length of compression pieces. The larger the value is, the more important of the length information is. Therefore, it can solve some problem resulted from peak shift.

auto_digitize - boolean, default=True

Enable auto digitization without prior knowledge of alpha.

Attributes

params: dict

Parameters of trained model.

string_ - str or list

Contains the ABBA representation.

digitize(series, pieces, alphabet_set=0, n_jobs=-1)[source]

Digitization

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression

len_ts - int

The length of time series.

num_pieces - int

The number of pieces.

init - str

Use aggregation, fast-kmeans or kmeans for digitization to get symbols.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

fit(series, n_jobs=-1, alphabet_set=0)[source]

Fitted the numerical series.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

fit_transform(series, n_jobs=-1, alphabet_set=0, return_start_set=False)[source]

Fitted the numerical series and transform them into symbolic representation.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

inverse_transform(string_sequences, start_set=None, n_jobs=1)[source]

Reconstruct the symbolic sequences to numerical sequences.

Parameters

string_sequences: list

Univariate or multivariate symbolic time series

start_set: list

starting value for each symbolic time series reconstruction.

hstack: boolean, default=False

Determine if concate multiple reconstructed time series into a single time series, which will be useful in the parallelism in univariate time series reconstruction.

n_jobs: int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows.

n_jobs_init(n_jobs=-1, _max=inf)[source]

Initialize parameter n_jobs.

parallel_compress(series, n_jobs=-1)[source]

Compress the numerical series in a parallel manner.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

piece_to_symbol(piece)[source]

Transform a piece to symbol.

Parameters

piece: numpy.ndarray

A piece from compression pieces.

string_separation(symbols, num_pieces)[source]

Separate symbols into symbolic subsequence.

transform(series, n_jobs=-1)[source]

Transform multiple series (numerical sequences) to symbolic sequences.

Parameters

series: numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs: int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows. Note: if n_jobs = 1, PABBA will degenerate to fABBA for transfomation.

transform_single_series(series)[source]

Transform a single series to symbols.

Parameters

series: numpy.ndarray, 1-dimension

Univariate time series

We illustrate some main components of fABBA below.

compress

class fABBA.chainApproximation.compress(ts, tol=0.5, max_len=-1)[source]

Approximate a time series using a continuous piecewise linear function.

Parameters

ts - numpy ndarray

Time series as input of numpy array.

tol - float

The tolerance that controls the accuracy.

max_len - int

The maximum length that compression restriction.

Returns

pieces - numpy array

Numpy ndarray with three columns, each row contains length, increment, error for the segment.

inverse_compress

fABBA.compress

alias of _compress

digitize

class fABBA.digitize(pieces, alpha=0.5, sorting='norm', scl=1, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression

Returns

string (str or list)

string sequence

inverse_digitize

class fABBA.inverse_digitize(strings, parameters)[source]

Convert symbolic representation back to compressed representation for reconstruction.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

centers - numpy array

centers of clusters from clustering algorithm. Each centre corresponds to character in string.

Returns

pieces - np.array

Time series in compressed format. See compression.

We can employ image compressing with fABBA using the convenient API image_compress and image_decompress.

image_compress

class fABBA.image_compress(fabba, data, adjust=True)[source]

image compression.

image_decompress

class fABBA.image_decompress(fabba, string)[source]

image decompression.