Following the previously proposed by Li, Lin, and Oates approach for time-series patterns discovery that is based on symbolic discretization with SAX and grammatical inference using Sequitur, I have implemented their algorithm in Java adding few new features. Among them is a highly efficient anomaly discovery technique based on the ``grammar rule density curve'' that reflects the discovered regularities in the input data allowing visual data exploration and an improved (and exact) variant of the HOT-SAX algorithm.
We have released Grammarviz 2.0 GUI under GPL, please find the code and documentation at GitHub.
The background heatmap under this time series reflects the values of "grammar rule density curve" and shows the density (the amount) of a grammar' rules encoding the string which was obtained by the discretization of this time series with SAX. Clearly, the anomalous heartbeat is easy to identify as it is not covered by any of grammar' rules -- i.e. there is no way to encode it, or to compress it, since Sequitur GI algorithm effectively compresses the input string. This highly efficient anomaly discovery is intuitively connected with the generic notion of Kolmogorov complexity.
This screenshot shows a recurrent grammar rule discovered by the tool. This rule encodes for similar fragments of normal heartbeats and, at the same time, also highlights the anomalous one. GrammarViz 2.0 interactive rules browser aids in the recurrent and anomalous patterns discovery enabling an interactive exploration of time series patterns .
Jmotif implements SAX and SAX-VSM algorithms targeting interpretable time series classification. This approach aids in knowledge discovery by enabiling comparative studies of time series generated by different processes, or by the same process under different conditions.
Here is an example of SAX-VSM application to the well studied MNIST dataset (10 classes of time series representing handwritten digits) illustrating the algorithms's rotational invariance, robustness, and the capacity of characteristic features discovery and ranking. I have applied SAX-VSM to a small subset of the most divergent digits from MNIST train dataset with SAX parameters of sliding window 190, PAA 15, and Alphabet 5:
The background heatmap under each digit shows the patterns (190 points sliding window) locations and their weighting by color. While highlighting the most relevant sliding window positions, this visualization does not account for pattern's internal structure.
Digits at this figure are heatmap-like colored. This visualization highlights their particular features which were found as the most relevant to their class by SAX-VSM.