Following the previously proposed by Li, Lin, and Oates approach for time-series patterns discovery based on symbolic discretization and grammatical inference, I have implemented their algorithm in Java while adding few new features. Among them is a highly efficient approximate anomaly discovery technique based on the grammar rule density curve. The rule density curve effectively highlights irregularities discovered in the input data enabling visual data exploration and interactive discretization parameters optimization. As we show in our recent work, this approach also enables the exact detection in a very efficient fashion (faster than current state of the art).
We have released Grammarviz 2.0 GUI under GPL, please find the code and documentation at our GitHub repository.
The background heatmap under this time series reflects the values of "grammar rule density curve" that reflects the amount of a grammar' rules encoding the string which was obtained by the discretization of this time series with SAX. Note how the anomalous heartbeat is clearly identified by a light blue color shade.
This screenshot shows a recurrent grammar rule discovered by our tool. This rule encodes for similar fragments of normal heartbeats and, at the same time, also highlights the anomalous one. GrammarViz 2.0 interactive rules browser aids in the recurrent and anomalous patterns discovery by enabling an interactive patterns exploration.
The SAX-VSM library implements Symbolic Aggregate approXimation (SAX) technique for time series discretization and the core components of Vector Space Model (VSM). Based on these, it implements SAX-VSM time series classification and clustering algorithm. Note, that in addition, the library provides SAX discretization parameters optimization scheme addressing a common for SAX-based techniques problem of parameters selection.
Below is shown an example of SAX-VSM application to a subset of the well studied MNIST dataset (10 classes of time series representing handwritten digits). This example illustrates the algorithms' rotational invariance, robustness, and the capacity of multiple characteristic features discovery and ranking. The sliding window of 190 was used in this example.
The background heatmap under each digit shows the patterns (190 points sliding window) locations and their weighting by color. While highlighting the most relevant sliding window positions, this visualization does not account for pattern's internal structure.