Multivariate time series symbolization
Here we domonstrate how to use fABBA
to symbolize multivariate (same applies to multiple univariate time series) with consistent symbols. After downloading the UEA time series dataset in corresponding folder, you can run JABBA following the example below:
import os
from scipy.io import arff
from fABBA import JABBA
import matplotlib.pyplot as plt
import numpy as np
_dir = 'data/UEA2018' # your data file location
def preprocess(data):
time_series = list()
for ii in data[0]:
database = list()
for i in ii[0]:
database.append(list(i))
time_series.append(database)
return np.nan_to_num(np.array(time_series))
filename = 'BasicMotions'
num= 10
data = arff.loadarff(os.path.join(_dir, os.path.join(filename, filename+'_TRAIN.arff')))
multivariate_ts = preprocess(data)
mts =((multivariate_ts[num].T - multivariate_ts[num].T.mean(axis=0)) /multivariate_ts[num].T.std(axis=0)).T
jabba1 = JABBA(tol=0.0002, verbose=1)
symbols_series = jabba1.fit_transform(mts)
reconstruction = jabba1.inverse_transform(symbols_series)
jabba2 = JABBA(tol=0.0002, init='k-means', k=jabba1.parameters.centers.shape[0], verbose=0)
symbols_series = jabba2.fit_transform(mts)
reconstruction_ABBA = jabba2.inverse_transform(symbols_series)
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(18, 5))
for i in range(2):
for j in range(3):
ax[i,j].plot(mts[i*3 + j], c='yellowgreen', linewidth=5,label='time series')
ax[i,j].plot(reconstruction_ABBA[i*3 + j], c='blue', linewidth=5, alpha=0.3,label='reconstruction - J-ABBA')
ax[i,j].plot(reconstruction[i*3 + j], c='purple', linewidth=5, alpha=0.3,label='reconstruction - J-fABBA')
ax[i,j].set_title('dimension '+str(i*3 + j))
ax[i,j].set_xticks([]);ax[i,j].set_yticks([])
plt.legend(loc='lower right', bbox_to_anchor=[-0.5, -0.5], framealpha=0.45)
plt.show()
You can also load dataset via loadData
:
from fABBA import loadData
train, test = loadData(name='Beef')
# Then perform JABBA
jabba = JABBA(tol=0.0002, verbose=1)
symbols_series = jabba.fit_transform(train[0])
reconstruction = jabba.inverse_transform(symbols_series)
Note
function loadData() is a lightweight API for time series dataset loading, which only supports part of data in UEA or UCR Archive, please refer to the document for full use detail. JABBA is used to process multiple time series as well as multivariate time series, so the input should be ensured to be 2-dimensional, for example, when loading the UCI dataset, e.g., Beef, use symbols = jabba.fit_transform(train) , when loading UEA dataset, e.g., BasicMotions, use symbols = jabba.fit_transform(train[0]) . For details, we refer to UCR/UEA time series dataset.
Functionality of loadData()
currently supports datasets: (1) UEA Archive: ‘AtrialFibrillation’, ‘BasicMotions’, ‘BasicMotions’, ‘CharacterTrajectories’, ‘LSST’, ‘Epilepsy’, ‘NATOPS’, ‘UWaveGestureLibrary’, ‘JapaneseVowels’; (2) UCR Archive: ‘Beef’.