Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4531-4541, 2020.

Abstract

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-huang20k, title = {Deep Graph Random Process for Relational-Thinking-Based Speech Recognition}, author = {Huang, Hengguan and Xue, Fuzhao and Wang, Hao and Wang, Ye}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {4531--4541}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/huang20k/huang20k.pdf}, url = {https://proceedings.mlr.press/v119/huang20k.html}, abstract = {Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.} }
Endnote
%0 Conference Paper %T Deep Graph Random Process for Relational-Thinking-Based Speech Recognition %A Hengguan Huang %A Fuzhao Xue %A Hao Wang %A Ye Wang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-huang20k %I PMLR %P 4531--4541 %U https://proceedings.mlr.press/v119/huang20k.html %V 119 %X Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.
APA
Huang, H., Xue, F., Wang, H. & Wang, Y.. (2020). Deep Graph Random Process for Relational-Thinking-Based Speech Recognition. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4531-4541 Available from https://proceedings.mlr.press/v119/huang20k.html.

Related Material