Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis

Zhijian Ou; Yang Zhang

Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis

Zhijian Ou, Yang Zhang

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:841-849, 2012.

Abstract

Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-ou12,
  title = 	 {Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis},
  author = 	 {Ou, Zhijian and Zhang, Yang},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {841--849},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/ou12/ou12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/ou12.html},
  abstract = 	 {Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.}
}

Endnote

%0 Conference Paper
%T Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
%A Zhijian Ou
%A Yang Zhang
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-ou12
%I PMLR
%P 841--849
%U https://proceedings.mlr.press/v22/ou12.html
%V 22
%X Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.

RIS


TY  - CPAPER
TI  - Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
AU  - Zhijian Ou
AU  - Yang Zhang
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-ou12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 841
EP  - 849
L1  - http://proceedings.mlr.press/v22/ou12/ou12.pdf
UR  - https://proceedings.mlr.press/v22/ou12.html
AB  - Most speech analysis/synthesis systems are based on the basic physical model of speech production - the acoustic tube model. There are two main drawbacks with current speech analysis methods. First, a common design paradigm seems to build a special-purpose signal-processing front-end followed by (when appropriate) a back-end based on probabilistic models. A difficulty is that most features are nonlinear operators of the speech waveform, whose statistical behavior is hard to be modeled. Second, different tasks of speech analysis are carried out separately. These practices are admittedly useful but not optimal due to the incomplete use of available information. These examinations motivate us to directly model the spectrogram and to integrate together the three fundamental speech parameters - the pitch, energy and spectral envelope. We successfully devise such a model called probabilistic acoustic tube (PAT) model. The integration is performed in a principled manner with explicit physical meaning. We demonstrate the capability of PAT for a number of speech analysis/synthesis tasks, such as pitch tracking under both clean and additive noise conditions, speech synthesis, and phoneme clustering.
ER  -

APA


Ou, Z. & Zhang, Y.. (2012). Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:841-849 Available from https://proceedings.mlr.press/v22/ou12.html.

Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis

Abstract

Cite this Paper

Related Material