Following the previously proposed by Li, Lin, and Oates approach for time-series patterns discovery based on the symbolic discretization and grammatical inference, I have implemented their algorithm in Java and added few new features. Among them is a highly efficient anomaly discovery technique that is based on the input string complexity and an improved, approximate variant of the HOT-SAX algorithm.
The background heatmap under the input time series at this screenshot shows the density (the amount) of a grammar' rules encoding the string which was obtained by the discretization of this subsequence with SAX. Clearly, the anomalous heartbeat is not covered by any of rules as it occurs only once at this ECG stretch. This anomaly discovery technique can be intuitevely connected with a Kolmogorov complexity - here, the deep blue color outlines subsequences that seen multiple times, thus they are highly compressible by very few grammar rules (i.e. low complexity), in contrast, the white color highlights subsequences that can not be compressed or their compression rate is low -- reflecting their rareness and, likely, the high complexity.
In contrast, this screenshot shows the recurrent grammar rule discovered by the tool. This rule encodes for similar fragments of normal heartbeats and, at the same time, highlights the anomalous one. The GrammarViz 2.0 interactive rules browser aids in the recurrent and anomalous patterns discovery enabling an interactive exploration of time series patterns .
Jmotif implements SAX and SAX-VSM algorithms targeting interpretable time series classification. This approach aids in knowledge discovery by enabiling comparative studies of time series generated by different processes, or by the same process under different conditions.
Here is an example of SAX-VSM application to the well studied MNIST dataset (10 classes of time series representing handwritten digits) illustrating the algorithms's rotational invariance, robustness, and the capacity of characteristic features discovery and ranking. I have applied SAX-VSM to a small subset of the most divergent digits from MNIST train dataset with SAX parameters of sliding window 190, PAA 15, and Alphabet 5:
The background heatmap under each digit shows the patterns (190 points sliding window) locations and their weighting by color. While highlighting the most relevant sliding window positions, this visualization does not account for pattern's internal structure.
Digits at this figure are heatmap-like colored. This visualization highlights their particular features which were found as the most relevant to their class by SAX-VSM.