API Reference
fABBA
is the API of the symbolic representation transformation for univariate time series.
fABBA
- class fABBA.fABBA(tol=0.1, alpha=0.5, sorting='2-norm', scl=1, verbose=1, max_len=-1, return_list=False, n_jobs=1)[source]
fABBA: A fast sorting-based aggregation method for symbolic time series representation
Parameters
- tol - float, default=0.1
Control tolerence for compression.
- alpha - float, default=0.5
Control tolerence for digitization.
- sorting - str, default=’2-norm’, {‘lexi’, ‘1-norm’, ‘2-norm’}
by which the sorting pieces prior to aggregation.
- scl - int, default=1
Scale for length, default as 1, refers to 2d-digitization, otherwise implement 1d-digitization.
- verbose - int, default=1
Verbosity mode, control logs print, default as 1; print logs.
- max_len - int, default=-1
The max length for each segment, optional choice for compression.
- return_list - boolean, default=True
Whether to return with list or not, “False” means return string.
- n_jobs - int, default=-1
The number of threads to use for the computation. -1 means no parallel computing.
Attributes
- parameters - Model
Contains the learnable parameters from the in-sample data.
Attributes: * centers - numpy.ndarray
the centers calculated for each group formed by aggregation
- splist - numpy.ndarray
the starting point for each group formed by aggregation
- alphabetsap - dict
store the oen to one key-value pair for labels earmarked for the groups and the corresponding character
- string_ - str or list
Contains the ABBA representation.
In addition to fit_transform, the compression and digitization functions are independent applicable to data.
- compress(series, fillm='bfill')[source]
Compress time series.
Parameters
- series - numpy.ndarray or list
Time series of the shape (1, n_samples).
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
- digitize(pieces, alphabet_set=0)[source]
Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters.
Parameters
- pieces - numpy.ndarray
The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression.
- alphabet_set - int or list
The list of alphabet letter.
Returns
- string - str or list)
String sequence.
- parameters - Model
The parameters of model.
- fit(series, fillm='bfill', alphabet_set=0)[source]
Compress and digitize the time series together.
Parameters
- series - numpy.ndarray or list
Time series of the shape (1, n_samples).
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
Returns
string (str): The string transformed by fABBA.
- fit_transform(series, fillm='bfill', alphabet_set=0)[source]
Compress and digitize the time series together.
Parameters
- series - numpy.ndarray or list
Time series of the shape (1, n_samples).
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
Returns
string (str): The string transformed by fABBA.
- inverse_transform(string, start=0, parameters=None)[source]
Convert ABBA symbolic representation back to numeric time series representation.
Parameters
- string - string
Time series in symbolic representation using unicode characters starting with character ‘a’.
- start - float
First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.
- parameters - Model
The parameters of model.
Returns
- series - list
Reconstruction of the time series.
- class fABBA.loadData(name='Beef')[source]
Load the example data.
Parameters
- namestr
- The dataset name, current support ‘AtrialFibrillation’, ‘BasicMotions’, ‘Beef’,
‘CharacterTrajectories’, ‘LSST’, ‘Epilepsy’, ‘NATOPS’, ‘UWaveGestureLibrary’, ‘JapaneseVowels’.
For more datasets, we refer the users to https://www.timeseriesclassification.com/ or https://archive.ics.uci.edu/datasets.
Returns
- train, testnumpy.ndarray
Return data for train and test, respectively.
ABBAbase
- class fABBA.ABBAbase(clustering, tol=0.1, scl=1, verbose=1, max_len=-1)[source]
- compress(series, fillm='bfill')[source]
Compress time series.
Parameters
- series - numpy.ndarray or list
Time series of the shape (1, n_samples).
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
- digitize(pieces, alphabet_set=0)[source]
Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce tol and len/inc scaling parameter scl.
In this variant, a ‘temporary’ cluster center is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. It is not necessarily the mean of all pieces in that cluster and hence the final cluster centers, which are just the means, might achieve a smaller within-cluster tol.
- fit(series, fillm='bfill', alphabet_set=0)[source]
Compress and digitize the time series together.
Parameters
- series - array or list
Time series.
- alpha - float
Control tolerence for digitization, default as 0.5.
- string_form - boolean
Whether to return with string form, default as True.
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
- fit_transform(series, fillm='bfill', alphabet_set=0)[source]
Compress and digitize the time series together.
Parameters
- series - array or list
Time series.
- alpha - float
Control tolerence for digitization, default as 0.5.
- string_form - boolean
Whether to return with string form, default as True.
- fillm - str, default = ‘zero’
Fill NA/NaN values using the specified method. ‘Zero’: Fill the holes of series with value of 0. ‘Mean’: Fill the holes of series with mean value. ‘Median’: Fill the holes of series with mean value. ‘ffill’: Forward last valid observation to fill gap.
If the first element is nan, then will set it to zero.
- ‘bfill’: Use next valid observation to fill gap.
If the last element is nan, then will set it to zero.
- inverse_transform(string, start=0, parameters=None)[source]
Convert ABBA symbolic representation back to numeric time series representation.
Parameters
- string - string
Time series in symbolic representation using unicode characters starting with character ‘a’.
- start - float
First element of original time series. Applies vertical shift in reconstruction. If not specified, the default is 0.
- parameters - Model
The parameters of model.
Returns
- series - list
Reconstruction of the time series.
JABBA
JABBA
is the API of the symbolic representation transformation for univariate time series, multivariate (rep., multiple univariate) time series, which allows for the combination of ABBA method with various clustering techniques.
- class fABBA.JABBA(tol=0.2, init='agg', k=2, r=0.5, alpha=None, sorting='norm', scl=1, max_iter=2, partition_rate=None, partition=None, max_len=inf, verbose=1, random_state=2022, fillna='ffill', auto_digitize=False)[source]
Parallel version of ABBA with fast implementation.
Parameters
- tol - double, default=0.5
Tolerance for compression.
- k - int, default=1
The number of clusters (distinct symbols) specified for ABBA.
- r - float, default=0.5
The rate of data sampling to perform k-means.
- alpha - double, default=0.5
Tolerance for digitization.
- init - str, default=’agg’
The clustering algorithm in digitization. optional: ‘f-kmeans’, ‘kmeans’.
- sorting - str, default=”norm”.
Apply sorting data before aggregation (inside digitization). Alternative option: “pca”.
- max_len - int
The max length of series contained in each compression pieces.
- max_iter - int, default=2
The max iteration for fast k-means algorithm.
- batch_size - int, default=1024
Size of the mini batches for mini-batch kmeans in digitization. For faster compuations, you can set the batch_size greater than 256 * number of cores to enable parallelism on all cores.
- verbose - int or boolean, default=1
Enable verbose output.
- partition_rate - float or int, default=None
This parameter is to get the number of partitions of time series. when this parameter is not None, the partitions will be n_jobs*int(np.round(np.exp(1/self.partition_rate), 0))
- partition - int:
The number of subsequences for time series to be partitioned.
- scl - int or float, default=1
Scale the length of compression pieces. The larger the value is, the more important of the length information is. Therefore, it can solve some problem resulted from peak shift.
- auto_digitize - boolean, default=True
Enable auto digitization without prior knowledge of alpha.
Attributes
- params: dict
Parameters of trained model.
- string_ - str or list
Contains the ABBA representation.
- digitize(series, pieces, alphabet_set=0, n_jobs=-1)[source]
Digitization
Parameters
- pieces - numpy.ndarray
The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression
- len_ts - int
The length of time series.
- num_pieces - int
The number of pieces.
- init - str
Use aggregation, fast-kmeans or kmeans for digitization to get symbols.
- alphabet_set - int or list, default=0
The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.
- fit(series, n_jobs=-1, alphabet_set=0)[source]
Fitted the numerical series.
Parameters
- series - numpy.ndarray, 2-dimension or 1-dimension
Univariate or multivariate time series
- n_jobs - int, default=-1
The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.
- alphabet_set - int or list, default=0
The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.
- fit_transform(series, n_jobs=-1, alphabet_set=0, return_start_set=False)[source]
Fitted the numerical series and transform them into symbolic representation.
Parameters
- series - numpy.ndarray, 2-dimension or 1-dimension
Univariate or multivariate time series
- n_jobs - int, default=-1
The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.
- alphabet_set - int or list, default=0
The list of alphabet letter. Here provide two different kinds of alphabet letters, namely 0 and 1.
- inverse_transform(string_sequences, start_set=None, n_jobs=1)[source]
Reconstruct the symbolic sequences to numerical sequences.
Parameters
- string_sequences: list
Univariate or multivariate symbolic time series
- start_set: list
starting value for each symbolic time series reconstruction.
- hstack: boolean, default=False
Determine if concate multiple reconstructed time series into a single time series, which will be useful in the parallelism in univariate time series reconstruction.
- n_jobs: int, default=-1
The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows.
- parallel_compress(series, n_jobs=-1)[source]
Compress the numerical series in a parallel manner.
Parameters
- series - numpy.ndarray, 2-dimension or 1-dimension
Univariate or multivariate time series
- n_jobs - int, default=-1
The mumber of processors used for parallelism. When n_jobs < 0, use all of available processors the machine allows. Note: For the univariate time series, if n_jobs = 1, PABBA will degenerate to fABBA, but the result may be diferent since PABBA use aggregated groups starting points for reconstruction instead of aggregated groups centers.
- piece_to_symbol(piece)[source]
Transform a piece to symbol.
Parameters
- piece: numpy.ndarray
A piece from compression pieces.
- transform(series, n_jobs=-1)[source]
Transform multiple series (numerical sequences) to symbolic sequences.
Parameters
- series: numpy.ndarray, 2-dimension or 1-dimension
Univariate or multivariate time series
- n_jobs: int, default=-1
The mumber of processors used for parallelism. When n_jobs < 0, use all of processors the machine allows. Note: if n_jobs = 1, PABBA will degenerate to fABBA for transfomation.
We illustrate some main components of fABBA
below.
compress
- class fABBA.chainApproximation.compress(ts, tol=0.5, max_len=-1)[source]
Approximate a time series using a continuous piecewise linear function.
Parameters
- ts - numpy ndarray
Time series as input of numpy array.
- tol - float
The tolerance that controls the accuracy.
- max_len - int
The maximum length that compression restriction.
Returns
- pieces - numpy array
Numpy ndarray with three columns, each row contains length, increment, error for the segment.
inverse_compress
- fABBA.compress
alias of
_compress
digitize
- class fABBA.digitize(pieces, alpha=0.5, sorting='norm', scl=1, alphabet_set=0)[source]
Greedy 2D clustering of pieces (a Nx2 numpy array), using tolernce alpha and len/inc scaling parameter scl. A ‘temporary’ group center, which we call it starting point, is used when assigning pieces to clusters. This temporary cluster is the first piece available after appropriate scaling and sorting of all pieces. After finishing the grouping procedure, the centers are calculated the mean value of the objects within the clusters
Parameters
- pieces - numpy.ndarray
The compressed pieces of numpy.ndarray with shape (n_samples, n_features) after compression
Returns
- string (str or list)
string sequence
inverse_digitize
- class fABBA.inverse_digitize(strings, parameters)[source]
Convert symbolic representation back to compressed representation for reconstruction.
Parameters
- string - string
Time series in symbolic representation using unicode characters starting with character ‘a’.
- centers - numpy array
centers of clusters from clustering algorithm. Each centre corresponds to character in string.
Returns
- pieces - np.array
Time series in compressed format. See compression.
We can employ image compressing with fABBA
using the convenient API image_compress
and image_decompress
.