Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:841-849, 2012.
Abstract
Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.
@InProceedings{pmlr-v22-ou12,
title = {Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis},
author = {Zhijian Ou and Yang Zhang},
booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
pages = {841--849},
year = {2012},
editor = {Neil D. Lawrence and Mark Girolami},
volume = {22},
series = {Proceedings of Machine Learning Research},
address = {La Palma, Canary Islands},
month = {21--23 Apr},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v22/ou12/ou12.pdf},
url = {http://proceedings.mlr.press/v22/ou12.html},
abstract = {Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.}
}
%0 Conference Paper
%T Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
%A Zhijian Ou
%A Yang Zhang
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami
%F pmlr-v22-ou12
%I PMLR
%J Proceedings of Machine Learning Research
%P 841--849
%U http://proceedings.mlr.press
%V 22
%W PMLR
%X Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.
TY - CPAPER
TI - Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
AU - Zhijian Ou
AU - Yang Zhang
BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
PY - 2012/03/21
DA - 2012/03/21
ED - Neil D. Lawrence
ED - Mark Girolami
ID - pmlr-v22-ou12
PB - PMLR
SP - 841
DP - PMLR
EP - 849
L1 - http://proceedings.mlr.press/v22/ou12/ou12.pdf
UR - http://proceedings.mlr.press/v22/ou12.html
AB - Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.
ER -
Ou, Z. & Zhang, Y.. (2012). Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in PMLR 22:841-849
This site last compiled Wed, 03 Jan 2018 13:53:01 +0000