API Reference

fABBA

The fABBA class provides the original FABBA algorithm for symbolic representation of univariate time series.

class fABBA.fABBA(tol=0.1, alpha=0.5, sorting='2-norm', scl=1, verbose=1, max_len=-1, return_list=False, n_jobs=1)[source]

Bases: Aggregation2D, ABBAbase

fABBA: A fast sorting-based aggregation method for symbolic time series representation

Parameters

tol - float, default=0.1: Control tolerence for compression.
alpha - float, default=0.5: Control tolerence for digitization.
sorting - str, default=’2-norm’, {‘lexi’, ‘1-norm’, ‘2-norm’}: by which the sorting pieces prior to aggregation.
scl - int, default=1: Scale for length, default as 1, refers to 2d-digitization, otherwise implement 1d-digitization.
verbose - int, default=1: Verbosity mode, control logs print, default as 1; print logs.
max_len - int, default=-1: The max length for each segment, optional choice for compression.
return_list - boolean, default=True: Whether to return with list or not, “False” means return string.
n_jobs - int, default=-1: The number of threads to use for the computation. -1 means no parallel computing.

Attributes

parameters - Model

Contains the learnable parameters from the in-sample data.

Attributes: * centers - numpy.ndarray

the centers calculated for each group formed by aggregation

splist - numpy.ndarray
the starting point for each group formed by aggregation
alphabetsap - dict
store the oen to one key-value pair for labels earmarked for the groups and the corresponding character

string_ - str or list

Contains the ABBA representation.

In addition to fit_transform, the compression and digitization functions are independent applicable to data.

property alpha

compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters.

Parameters

pieces - numpy.ndarray: The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression.
alphabet_set - int or list: The list of alphabet letter.

Returns

string - str or list): String sequence.
parameters - Model: The parameters of model.

dump(file=None)[source]

fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

Returns

string (str): The string transformed by fABBA.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string: Time series in symbolic representation using unicode characters starting with character ‘a’.
start - float: First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.
parameters - Model: The parameters of model.

Returns

series - list: Reconstruction of the time series.

load(file=None, replace=False)[source]

property max_len

property n_jobs

static print_parameters(cls)[source]

property return_list

property scl

property sorting

property tol

property verbose

ABBAbase

Base class shared by fABBA and JABBA.

class fABBA.ABBAbase(clustering, tol=0.1, scl=1, verbose=1, max_len=-1)[source]

compress(series, fillm='bfill')[source]

Compress time series.

Parameters

series - numpy.ndarray or list

Time series of the shape (1, n_samples).

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

digitize(pieces, alphabet_set=0)[source]

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce tol and len/inc scaling parameter scl.

In this variant, a ‘temporary’ cluster center is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. It is not necessarily the mean of all pieces in that cluster and hence the final cluster centers, which are just the means, might achieve a smaller within-cluster tol.

fit(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

fit_transform(series, fillm='bfill', alphabet_set=0)[source]

Compress and digitize the time series together.

Parameters

series - array or list

Time series.

alpha - float

Control tolerence for digitization, default as 0.5.

string_form - boolean

Whether to return with string form, default as True.

fillm - str, default = ‘zero’

Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.

If the first element is nan, then will set it to zero.

‘bfill’: Use next valid observation to fill gap.: If the last element is nan, then will set it to zero.

inverse_transform(string, start=0, parameters=None)[source]

Convert ABBA symbolic representation back to numeric time series representation.

Parameters

string - string: Time series in symbolic representation using unicode characters starting with character ‘a’.
start - float: First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.
parameters - Model: The parameters of model.

Returns

series - list: Reconstruction of the time series.

JABBA

JABBA is an extended and highly optimized version that supports: - Univariate and multivariate time series - Multiple clustering backends (including GPU-accelerated) - Memory-efficient and parallel aggregation

class fABBA.JABBA(tol=0.2, init='agg', k=2, r=0.5, alpha=None, sorting='norm', scl=1, max_iter=10, partition_rate=None, partition=None, max_len=inf, verbose=1, random_state=2022, eta=None, fillna='ffill', last_dim=True, auto_digitize=False, trim_method='keep_ends')[source]

Bases: object

Parallel version of ABBA with fast implementation.

Parameters

tol - double, default=0.5: Tolerance for compression.
k - int, default=1: The number of clusters (distinct symbols) specified for ABBA.
r - float, default=0.5: The rate of data sampling to perform k-means.
alpha - double, default=None: Tolerance for digitization. If None is set, auto-digitization will be enabled.
init - str, default=’agg’: The clustering algorithm in digitization. optional: ‘f-kmeans’, ‘kmeans’, ‘gpu-kmeans’.
sorting - str, default=”norm”.: Apply sorting data before aggregation (inside digitization). Alternative option: “pca”.
max_len - int: The max length of series contained in each compression pieces.
max_iter - int, default=2: The max iteration for fast k-means algorithm.
batch_size - int, default=1024: Size of the mini batches for mini-batch kmeans in digitization. For faster compuations, you can set the batch_size greater than 256 * number of cores to enable parallelism on all cores.
verbose - int or boolean, default=1: Enable verbose output.
partition_rate - float or int, default=None: This parameter is to get the number of partitions of time series. when this parameter is not None, the partitions will be n_jobs*int(np.round(np.exp(1/self.partition_rate), 0))
partition - int:: The number of subsequences for time series to be partitioned.
scl - int or float, default=1: Scale the length of compression pieces. The larger the value is, the more important of the length information is. Therefore, it can solve some problem resulted from peak shift.
eta - float, default=None,: Parameter to control the auto-digitization. If None, eta = 3.
last_dim - boolean, default=True,: The method to process the varying shape (>=2) of time series. True as default otherwise flatten the shape dimension > 1.
auto_digitize - boolean, default=False: Enable auto digitization without prior knowledge of alpha.
trim_method - str, default=’pad’: The method to process the varying length of time series. ‘pad’ : pad the time series to the max length with the last value. ‘keep_ends’ : keep the first and last elements, trim or pad the middle part to the max length.

Attributes

params: dict: Parameters of trained model.
string_ - str or list: Contains the ABBA representation.

property alpha

digitize(series, pieces, alphabet_set=0, n_jobs=-1)[source]

Digitization

Parameters

pieces - numpy.ndarray: The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression
len_ts - int: The length of time series.
num_pieces - int: The number of pieces.
init - str: Use aggregation, fast-kmeans or kmeans for digitization to get symbols.
alphabet_set - int or list, default=0: The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

property eta

fit(series, n_jobs=-1, alphabet_set=0)[source]

Fitted the numerical series.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension: Univariate or multivariate time series
n_jobs - int, default=-1: The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.
alphabet_set - int or list, default=0: The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

fit_transform(series, n_jobs=-1, alphabet_set=0, return_start_set=False)[source]

Fitted the numerical series and transform them into symbolic representation.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension: Univariate or multivariate time series
n_jobs - int, default=-1: The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.
alphabet_set - int or list, default=0: The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.

inverse_transform(string_sequences, start_set=None, n_jobs=1)[source]

Reconstruct the symbolic sequences to numerical sequences.

Parameters

string_sequences: list: Univariate or multivariate symbolic time series
start_set: list: starting value for each symbolic time series reconstruction.
hstack: boolean, default=False: Determine if concate multiple reconstructed time series into a single time series, which will be useful in the parallelism in univariate time series reconstruction.
n_jobs: int, default=-1: The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows.

property k

property max_len

n_jobs_init(n_jobs=-1, _max=inf)[source]: Initialize parameter n_jobs.

parallel_compress(series, n_jobs=-1)[source]

Compress the numerical series in a parallel manner.

Parameters

series - numpy.ndarray, 2-dimension or 1-dimension: Univariate or multivariate time series
n_jobs - int, default=-1: The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.

property partition

property partition_rate

piece_to_symbol(piece)[source]

Transform a piece to symbol.

Parameters

piece: numpy.ndarray: A piece from compression pieces.

recast_shape(reconstruct_list)[source]: Reshape the multiarray to the same shape of the input, the shape might be expanded or squeezed.

property scl

property sorting

string_separation(symbols, num_pieces)[source]: Separate symbols into symbolic subsequence.

property tol

transform(series, n_jobs=-1)[source]

Transform multiple series (numerical sequences) to symbolic sequences.

Parameters

series: numpy.ndarray, 2-dimension or 1-dimension: Univariate or multivariate time series
n_jobs: int, default=-1: The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows. Note: if n_jobs = 1, PABBA will degenerate to fABBA for transfomation.

transform_single_series(series)[source]

Transform a single series to symbols.

Parameters

series: numpy.ndarray, 1-dimension: Univariate time series

Core Transformation Methods

These functions/methods are the building blocks used internally and can also be used directly.

compress

Perform piecewise linear aggregation (tolerance-based chain approximation).

chainApproximation.compress(tol=0.5, max_len=-1)

Approximate a time series using a continuous piecewise linear function.

Parameters

ts - numpy ndarray: Time series as input of numpy array.
tol - float: The tolerance that controls the accuracy.
max_len - int: The maximum length that compression restriction.

Returns

pieces - numpy array: Numpy ndarray with three columns, each row contains length, increment, error for the segment.

inverse_compress

Reconstruct time series from compressed piecewise aggregates.

fABBA.inverse_compress(pieces, start)

digitize

Convert piecewise linear segments into symbolic representation (SAX-like).

digitization.digitize(alpha=0.5, sorting='norm', scl=1, alphabet_set=0)

Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters

Parameters

pieces - numpy.ndarray: The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression

Returns

string (str or list): string sequence

inverse_digitize

Reconstruct approximate time series from symbolic string and centers.

fABBA.inverse_digitize(strings, parameters)[source]

Convert symbolic representation back to compressed representation for reconstruction.

Parameters

string - string: Time series in symbolic representation using unicode characters starting with character ‘a’.
centers - numpy array: centers of clusters from clustering algorithm. Each centre corresponds to character in string.

Returns

pieces - np.array: Time series in compressed format. See compression.

Image Compression Utilities

Convenient APIs for compressing 2D arrays/images using fABBA.

image_compress

Compress a 2D image/array into a symbolic string using block-wise fABBA.

fABBA.image_compress(fabba, data, adjust=True)[source]: image compression.

API Reference

fABBA

Parameters

Attributes

Parameters

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

ABBAbase

Parameters

Parameters

Parameters

Parameters

Returns

JABBA

Parameters

Attributes

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters

Parameters

Core Transformation Methods

compress

Parameters

Returns

inverse_compress

digitize

Parameters

Returns

inverse_digitize

Parameters

Returns

Image Compression Utilities

image_compress

image_decompress

Dataset Loading Utilities

Other Utilities