Welcome to fABBA’s documentation!

License: MIT PyPI version Documentation Status

fABBA — Fast and Accurate Symbolic Representation for Time Series

fABBA (fast ABBA) is a state-of-the-art, highly optimized symbolic aggregate approximation method for univariate and multivariate time series. It achieves extremely high compression ratios (often > 100–1000×) while remaining fully reversible and providing tight error bounds.

The method consists of two core steps:

  1. Lossy piecewise linear compression (tolerance-driven polygonal chain approximation)

  2. Mean-based clustering of segments -> symbolic representation (fully automated, no need to pre-specify alphabet size)

Because the resulting representation is symbolic, it naturally leads to:

  • Strong noise smoothing

  • Drastic dimensionality reduction

  • Ultra-fast distance computations (via lookup tables)

  • Seamless integration with classic data mining algorithms (motif discovery, anomaly detection, classification, clustering, indexing, etc.)

fABBA significantly outperforms the original ABBA [1] in speed (often 10–100× faster) while producing nearly identical or even better symbolic sequences.

Illustration of the ABBA/fABBA symbolization process

Visualization of the fABBA transformation process (source: Stefan Güttel, Turing–Manchester presentation, 2021).

Key Advantages of fABBA

Core Methods & Variants

  • fABBA.fABBA -> Original fast single-series implementation (pure Python + Cython)

  • fABBA.JABBA -> Next-generation engine supporting:
    • Univariate & multivariate series

    • Custom clustering backends (k-means, hierarchical, GPU, etc.)

    • Memory-optimized streaming aggregation

  • fABBA.image_compress / image_decompress -> Turn any 2D array/image into a short string and back

Applications

fABBA has demonstrated superior performance in numerous domains:

  • Time-series classification & clustering (UCR/UEA archives)

  • Extreme compression of sensor data (IoT, wearables, finance)

  • Motif & discord discovery at massive scale

  • Anomaly detection with symbolic distance measures

  • Lossy but reconstructible storage of medical signals (ECG, EEG)

  • Image and video frame compression via block-wise symbolization

Quick Example

from fABBA import fABBA
import numpy as np
import matplotlib.pyplot as plt

ts = np.load("example_series.npy")
fabba = fABBA(tol=0.1, alpha=0.01, method='agg')
string, centers = fabba.fit_transform(ts)

print(f"Original length : {len(ts)}")
print(f"Compressed to   : {len(string)} symbols  ->  compression ratio {(len(ts)/len(string)):.1f}×")
print(f"Symbolic string : {string}")

reconstructed = fabba.inverse_transform(string, centers)

plt.plot(ts, label="Original")
plt.plot(reconstructed, "--", label="Reconstructed")
plt.legend(); plt.show()

References

Getting Started

pip install fABBA          # includes pre-compiled wheels for Linux/macOS/Windows

Full documentation: https://fabba.readthedocs.io

We welcome contributions! Whether it’s new clustering backends, performance improvements, or better documentation — feel free to open issues or pull requests.

Enjoy ultra-fast symbolic time-series analysis with fABBA!

Guide

API Reference

Others

Indices and Tables

_images/nla_group.png