Online Grammar Compression for Frequent Pattern Discovery

Shouhei Fukunaga, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto
; Proceedings of The 13th International Conference on Grammatical Inference, PMLR 57:93-104, 2017.

Abstract

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v57-fukunaga16, title = {Online Grammar Compression for Frequent Pattern Discovery}, author = {Shouhei Fukunaga and Yoshimasa Takabatake and Tomohiro I and Hiroshi Sakamoto}, booktitle = {Proceedings of The 13th International Conference on Grammatical Inference}, pages = {93--104}, year = {2017}, editor = {Sicco Verwer and Menno van Zaanen and Rick Smetsers}, volume = {57}, series = {Proceedings of Machine Learning Research}, address = {Delft, The Netherlands}, month = {05--07 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v57/fukunaga16.pdf}, url = {http://proceedings.mlr.press/v57/fukunaga16.html}, abstract = {Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.} }
Endnote
%0 Conference Paper %T Online Grammar Compression for Frequent Pattern Discovery %A Shouhei Fukunaga %A Yoshimasa Takabatake %A Tomohiro I %A Hiroshi Sakamoto %B Proceedings of The 13th International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2017 %E Sicco Verwer %E Menno van Zaanen %E Rick Smetsers %F pmlr-v57-fukunaga16 %I PMLR %J Proceedings of Machine Learning Research %P 93--104 %U http://proceedings.mlr.press %V 57 %W PMLR %X Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.
RIS
TY - CPAPER TI - Online Grammar Compression for Frequent Pattern Discovery AU - Shouhei Fukunaga AU - Yoshimasa Takabatake AU - Tomohiro I AU - Hiroshi Sakamoto BT - Proceedings of The 13th International Conference on Grammatical Inference PY - 2017/01/16 DA - 2017/01/16 ED - Sicco Verwer ED - Menno van Zaanen ED - Rick Smetsers ID - pmlr-v57-fukunaga16 PB - PMLR SP - 93 DP - PMLR EP - 104 L1 - http://proceedings.mlr.press/v57/fukunaga16.pdf UR - http://proceedings.mlr.press/v57/fukunaga16.html AB - Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm. ER -
APA
Fukunaga, S., Takabatake, Y., I, T. & Sakamoto, H.. (2017). Online Grammar Compression for Frequent Pattern Discovery. Proceedings of The 13th International Conference on Grammatical Inference, in PMLR 57:93-104

Related Material