Online Grammar Compression for Frequent Pattern Discovery

Shouhei Fukunaga; Yoshimasa Takabatake; Tomohiro I; Hiroshi Sakamoto

Online Grammar Compression for Frequent Pattern Discovery

Shouhei Fukunaga, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

Proceedings of The 13th International Conference on Grammatical Inference, PMLR 57:93-104, 2017.

Abstract

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v57-fukunaga16,
  title = 	 {Online Grammar Compression for Frequent Pattern Discovery},
  author = 	 {Fukunaga, Shouhei and Takabatake, Yoshimasa and I, Tomohiro and Sakamoto, Hiroshi},
  booktitle = 	 {Proceedings of The 13th International Conference on Grammatical Inference},
  pages = 	 {93--104},
  year = 	 {2017},
  editor = 	 {Verwer, Sicco and Zaanen, Menno van and Smetsers, Rick},
  volume = 	 {57},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Delft, The Netherlands},
  month = 	 {05--07 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v57/fukunaga16.pdf},
  url = 	 {http://proceedings.mlr.press/v57/fukunaga16.html},
  abstract = 	 {Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.}
}

Endnote

%0 Conference Paper
%T Online Grammar Compression for Frequent Pattern Discovery
%A Shouhei Fukunaga
%A Yoshimasa Takabatake
%A Tomohiro I
%A Hiroshi Sakamoto
%B Proceedings of The 13th International Conference on Grammatical Inference
%C Proceedings of Machine Learning Research
%D 2017
%E Sicco Verwer
%E Menno van Zaanen
%E Rick Smetsers	
%F pmlr-v57-fukunaga16
%I PMLR
%P 93--104
%U http://proceedings.mlr.press/v57/fukunaga16.html
%V 57
%X Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

RIS


TY  - CPAPER
TI  - Online Grammar Compression for Frequent Pattern Discovery
AU  - Shouhei Fukunaga
AU  - Yoshimasa Takabatake
AU  - Tomohiro I
AU  - Hiroshi Sakamoto
BT  - Proceedings of The 13th International Conference on Grammatical Inference
DA  - 2017/01/16
ED  - Sicco Verwer
ED  - Menno van Zaanen
ED  - Rick Smetsers	
ID  - pmlr-v57-fukunaga16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 57
SP  - 93
EP  - 104
L1  - http://proceedings.mlr.press/v57/fukunaga16.pdf
UR  - http://proceedings.mlr.press/v57/fukunaga16.html
AB  - Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio Ω(\frac1\lg^2m) to Ω(\frac1\lg^*N\lg m) where m is the length of an optimal pattern in a string of length N and \lg^* is the iteration of the logarithm base 2. For a sufficiently large N, \lg^*N is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.
ER  -

APA


Fukunaga, S., Takabatake, Y., I, T. & Sakamoto, H.. (2017). Online Grammar Compression for Frequent Pattern Discovery. Proceedings of The 13th International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 57:93-104 Available from http://proceedings.mlr.press/v57/fukunaga16.html.

Related Material

Download PDF