Mult-channel symbolization
JABBA is a fast, parallel, and fully multivariate-aware implementation of the fABBA (fast Adaptive Brownian Bridge-based Approximation) symbolic aggregation method for time series.
It extends the original ABBA with:
Native support for multivariate and high-dimensional arrays (images, video frames, sensor arrays)
Automatic shape preservation and restoration
Parallel compression & digitization via multiprocessing
Three digitization backends: adaptive aggregation (original ABBA), K-means, and GPU-accelerated K-means
Out-of-sample transformation
Auto-digitization (no need to tune
alpha)
Perfect for: motif discovery, compression, clustering, classification, anomaly detection on real-world multivariate data.
Core Idea
Compress each time series into piecewise linear segments (length, increment)
Digitize all pieces across all series/channels into shared symbols
Reconstruct using starting points + symbolic sequence
Symbols are consistent across all variables -> enables cross-channel pattern mining.
Quick Start – One-Liner
from fABBA import JABBA
import numpy as np
# 50 multivariate time series, 6 channels, 500 timesteps each
X = np.random.randn(50, 6, 500)
jabba = JABBA(tol=0.05, verbose=1)
symbols = jabba.fit_transform(X) # List[List[str]] — one sequence per series
X_reconstructed = jabba.inverse_transform(symbols)
print(f"Reconstruction error: {np.linalg.norm(X - X_reconstructed):.4f}")
Full Usage Examples
1. Multiple Univariate or Multivariate Time Series (Most Common)
from fABBA import JABBA
import numpy as np
import matplotlib.pyplot as plt
# Simulate 20 independent univariate series
np.random.seed(0)
data = np.cumsum(np.random.randn(20, 800), axis=1) # random walks
jabba = JABBA(tol=0.1, init='agg', verbose=1) # auto-digitization
symbols = jabba.fit_transform(data)
recon = jabba.inverse_transform(symbols)
# Plot first 3 series
plt.figure(figsize=(12, 6))
for i in range(3):
plt.subplot(3, 1, i+1)
plt.plot(data[i], label='Original', alpha=0.8)
plt.plot(recon[i], '--', label='JABBA reconstruction')
plt.title(f'Series {i} -> compressed to {len(symbols[i])} symbols')
plt.legend()
plt.tight_layout()
plt.show()
3. High-Dimensional Arrays (Video, Spectrograms, Images over Time)
# 8 video clips: 30 frames × 112 × 112 × 3
video = np.random.rand(8, 30, 112, 112, 3)
jabba = JABBA(tol=0.1, verbose=1)
symbols = jabba.fit_transform(video) # treats as 8 × (30, 112*112*3) series
flat_recon = jabba.inverse_transform(symbols) # (8, 30*112*112*3)
video_recon = jabba.recast_shape(flat_recon) # -> (8, 30, 112, 112, 3)
print("Original shape:", video.shape)
print("Restored shape :", video_recon.shape)
print("Max abs error :", np.max(np.abs(video - video_recon)))
Important: recast_shape only works if input was a NumPy array (not list/tensor).
4. Out-of-Sample (Test Set) Symbolization
X_train = np.random.randn(100, 5, 200)
X_test = np.random.randn(30, 5, 200)
jabba = JABBA(tol=0.05).fit(X_train) # learn vocabulary
symbols_test, starts = jabba.transform(X_test) # use same symbols!
X_test_recon = jabba.inverse_transform(symbols_test, starts)
print(f"Test set reconstructed with {len(jabba.parameters.alphabets)} shared symbols")
5. Fixed vs Adaptive Vocabulary
data = np.random.randn(50, 4, 1000)
# Adaptive (recommended): let JABBA decide how many symbols
adaptive = JABBA(tol=0.03, init='agg', verbose=0)
adaptive.fit_transform(data)
print("Adaptive -> symbols:", len(adaptive.parameters.alphabets))
# Fixed vocabulary (faster, reproducible)
fixed = JABBA(tol=0.03, init='kmeans', k=80, verbose=0)
fixed.fit_transform(data)
print("Fixed k=80 -> symbols:", len(fixed.parameters.alphabets))
6. GPU-Accelerated Digitization (Large Datasets)
huge_data = np.random.randn(1000, 20, 2000) # 40 million points
jabba = JABBA(tol=0.05, init='gpu-kmeans', k=200, verbose=1)
symbols = jabba.fit_transform(huge_data, n_jobs=16) # blazing fast
Parameter Guide
When to Use Which Mode?
Goal |
Recommended Settings |
|---|---|
Best reconstruction quality |
|
Fast & reproducible |
|
>1M time points |
|
Motif discovery across channels |
|
Tips from the Source Code
JABBA automatically standardizes data unless you set
adjust=Falseingeneral_compressFor peak-shift robustness -> increase
scl(e.g.scl=3–5)For very noisy data -> slightly increase
tolUse
jabba.parameters.centersand.alphabetsto inspect learned prototypessymbolsis a list of lists of strings — perfect input forpysax,matrix profile, orLSTM+embedding
You’re now ready to symbolize anything from ECG to satellite imagery.
Happy compressing!