API Reference

fABBA

The fABBA class provides the original FABBA algorithm for symbolic representation of univariate time series.

class fABBA.fABBA(tol=0.1, alpha=0.5, sorting='2-norm', scl=1, verbose=1, max_len=-1, return_list=False, n_jobs=1)[source]

Bases: Aggregation2D, ABBAbase

fABBA: A fast sorting-based aggregation method for symbolic time series representation

Parameters

tol - float, default=0.1

Control tolerence for compression.

alpha - float, default=0.5

Control tolerence for digitization.

sorting - str, default=’2-norm’, {‘lexi’, ‘1-norm’, ‘2-norm’}

by which the sorting pieces prior to aggregation.

scl - int, default=1

Scale for length, default as 1, refers to 2d-digitization, otherwise implement 1d-digitization.

verbose - int, default=1

Verbosity mode, control logs print, default as 1; print logs.

max_len - int, default=-1

The max length for each segment, optional choice for compression.

return_list - boolean, default=True

Whether to return with list or not, “False” means return string.

n_jobs - int, default=-1

The number of threads to use for the computation. -1 means no parallel computing.

Attributes

parameters - Model

Contains the learnable parameters from the in-sample data.

Attributes: * centers - numpy.ndarray

the centers calculated for each group formed by aggregation

  • splist - numpy.ndarray

    the starting point for each group formed by aggregation

  • alphabetsap - dict

    store the oen to one key-value pair for labels earmarked for the groups and the corresponding character

string_ - str or list

Contains the ABBA representation.

  • In addition to fit_transform, the compression and digitization functions are independent applicable to data.

property alpha
compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters.

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression.

alphabet_set - int or list

The list of alphabet letter.

Returns

string - str or list)

String sequence.

parameters - Model

The parameters of model.

dump(file=None)[source]
fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

start - float

First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.

parameters - Model

The parameters of model.

Returns

series - list

Reconstruction of the time series.

load(file=None, replace=False)[source]
property max_len
property n_jobs
static print_parameters(cls)[source]
property return_list
property scl
property sorting
property tol
property verbose

ABBAbase

Base class shared by fABBA and JABBA.

class fABBA.ABBAbase(clustering, tol=0.1, scl=1, verbose=1, max_len=-1)[source]
compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce tol and len/inc scaling parameter scl.

In this variant, a ‘temporary’ cluster center is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. It is not necessarily the mean of all pieces in that cluster and hence the final cluster centers, which are just the means, might achieve a smaller within-cluster tol.

fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.

If the last element is nan, then will set it to zero.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

start - float

First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.

parameters - Model

The parameters of model.

Returns

series - list

Reconstruction of the time series.

JABBA

JABBA is an extended and highly optimized version that supports: - Univariate and multivariate time series - Multiple clustering backends (including GPU-accelerated) - Memory-efficient and parallel aggregation

class fABBA.JABBA(tol=0.2, init='agg', k=2, r=0.5, alpha=None, sorting='norm', scl=1, max_iter=10, partition_rate=None, partition=None, max_len=inf, verbose=1, random_state=2022, eta=None, fillna='ffill', last_dim=True, auto_digitize=False, trim_method='keep_ends')[source]

Bases: object

Parallel version of ABBA with fast implementation.

Parameters

tol - double, default=0.5

Tolerance for compression.

k - int, default=1

The number of clusters (distinct symbols) specified for ABBA.

r - float, default=0.5

The rate of data sampling to perform k-means.

alpha - double, default=None

Tolerance for digitization. If None is set, auto-digitization will be enabled.

init - str, default=’agg’

The clustering algorithm in digitization. optional: ‘f-kmeans’, ‘kmeans’, ‘gpu-kmeans’.

sorting - str, default=”norm”.

Apply sorting data before aggregation (inside digitization). Alternative option: “pca”.

max_len - int

The max length of series contained in each compression pieces.

max_iter - int, default=2

The max iteration for fast k-means algorithm.

batch_size - int, default=1024

Size of the mini batches for mini-batch kmeans in digitization. For faster compuations, you can set the batch_size greater than 256 * number of cores to enable parallelism on all cores.

verbose - int or boolean, default=1

Enable verbose output.

partition_rate - float or int, default=None

This parameter is to get the number of partitions of time series. when this parameter is not None, the partitions will be n_jobs*int(np.round(np.exp(1/self.partition_rate), 0))

partition - int:

The number of subsequences for time series to be partitioned.

scl - int or float, default=1

Scale the length of compression pieces. The larger the value is, the more important of the length information is. Therefore, it can solve some problem resulted from peak shift.

eta - float, default=None,

Parameter to control the auto-digitization. If None, eta = 3.

last_dim - boolean, default=True,

The method to process the varying shape (>=2) of time series. True as default otherwise flatten the shape dimension > 1.

auto_digitize - boolean, default=False

Enable auto digitization without prior knowledge of alpha.

trim_method - str, default=’pad’

The method to process the varying length of time series. ‘pad’ : pad the time series to the max length with the last value. ‘keep_ends’ : keep the first and last elements, trim or pad the middle part to the max length.

Attributes

params: dict

Parameters of trained model.

string_ - str or list

Contains the ABBA representation.

property alpha
digitize(series, pieces, alphabet_set=0, n_jobs=-1)[source]

Digitization

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression

len_ts - int

The length of time series.

num_pieces - int

The number of pieces.

init - str

Use aggregation, fast-kmeans or kmeans for digitization to get symbols.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

property eta
fit(series, n_jobs=-1, alphabet_set=0)[source]

Fitted the numerical series.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

fit_transform(series, n_jobs=-1, alphabet_set=0, return_start_set=False)[source]

Fitted the numerical series and transform them into symbolic representation.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

alphabet_set - int or list, default=0

The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

inverse_transform(string_sequences, start_set=None, n_jobs=1)[source]

Reconstruct the symbolic sequences to numerical sequences.

Parameters

string_sequences: list

Univariate or multivariate symbolic time series

start_set: list

starting value for each symbolic time series reconstruction.

hstack: boolean, default=False

Determine if concate multiple reconstructed time series into a single time series, which will be useful in the parallelism in univariate time series reconstruction.

n_jobs: int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows.

property k
property max_len
n_jobs_init(n_jobs=-1, _max=inf)[source]

Initialize parameter n_jobs.

parallel_compress(series, n_jobs=-1)[source]

Compress the numerical series in a parallel manner.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs - int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

property partition
property partition_rate
piece_to_symbol(piece)[source]

Transform a piece to symbol.

Parameters

piece: numpy.ndarray

A piece from compression pieces.

recast_shape(reconstruct_list)[source]

Reshape the multiarray to the same shape of the input, the shape might be expanded or squeezed.

property scl
property sorting
string_separation(symbols, num_pieces)[source]

Separate symbols into symbolic subsequence.

property tol
transform(series, n_jobs=-1)[source]

Transform multiple series (numerical sequences) to symbolic sequences.

Parameters

series: numpy.ndarray, 2-dimension or 1-dimension

Univariate or multivariate time series

n_jobs: int, default=-1

The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows. Note: if n_jobs = 1, PABBA will degenerate to fABBA for transfomation.

transform_single_series(series)[source]

Transform a single series to symbols.

Parameters

series: numpy.ndarray, 1-dimension

Univariate time series

Core Transformation Methods

These functions/methods are the building blocks used internally and can also be used directly.

compress

Perform piecewise linear aggregation (tolerance-based chain approximation).

chainApproximation.compress(tol=0.5, max_len=-1)

Approximate a time series using a continuous piecewise linear function.

Parameters

ts - numpy ndarray

Time series as input of numpy array.

tol - float

The tolerance that controls the accuracy.

max_len - int

The maximum length that compression restriction.

Returns

pieces - numpy array

Numpy ndarray with three columns, each row contains length, increment, error for the segment.

inverse_compress

Reconstruct time series from compressed piecewise aggregates.

fABBA.inverse_compress(pieces, start)

digitize

Convert piecewise linear segments into symbolic representation (SAX-like).

digitization.digitize(alpha=0.5, sorting='norm', scl=1, alphabet_set=0)

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters

Parameters

pieces - numpy.ndarray

The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression

Returns

string (str or list)

string sequence

inverse_digitize

Reconstruct approximate time series from symbolic string and centers.

fABBA.inverse_digitize(strings, parameters)[source]

Convert symbolic representation back to compressed representation for reconstruction.

Parameters

string - string

Time series in symbolic representation using unicode characters starting with character ‘a’.

centers - numpy array

centers of clusters from clustering algorithm. Each centre corresponds to character in string.

Returns

pieces - np.array

Time series in compressed format. See compression.

Image Compression Utilities

Convenient APIs for compressing 2D arrays/images using fABBA.

image_compress

Compress a 2D image/array into a symbolic string using block-wise fABBA.

fABBA.image_compress(fabba, data, adjust=True)[source]

image compression.

image_decompress

Decompress a symbolic string back into an image/array.

fABBA.image_decompress(fabba, string)[source]

image decompression.

Dataset Loading Utilities

Other Utilities