Employing The Complete Face in AVSR to Recover from Facial Occlusions

Benjamin X. Hall; John Shawe-Taylor; Alan Johnston

Employing The Complete Face in AVSR to Recover from Facial Occlusions

Benjamin X. Hall, John Shawe-Taylor, Alan Johnston

Proceedings of the Second Workshop on Applications of Pattern Analysis, PMLR 17:33-40, 2011.

Abstract

Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real world situations because any occlusion to this small area renders all visual advantage null and void. This is poorby design because it is well known that humans use the complete face to speechread. We demonstrate a new application of a novel visual algorithm, the Multi-Channel Gradient Model, the deploys information from the complete face to perform AVSR. Our MCGM model performs near to the performance of Discrete Cosine Transforms in the case where a small region of interest around the lips, but in the case of an occluded face we can achieve results that match nearly 70% of the performance that DCTs can achieve on the DCT best case, lips centeric approach.

Cite this Paper

BibTeX

@InProceedings{pmlr-v17-hall11a,
  title = 	 {Employing The Complete Face in AVSR to Recover from Facial Occlusions},
  author = 	 {Hall, Benjamin X. and Shawe-Taylor, John and Johnston, Alan},
  booktitle = 	 {Proceedings of the Second Workshop on Applications of Pattern Analysis},
  pages = 	 {33--40},
  year = 	 {2011},
  editor = 	 {Diethe, Tom and Balcazar, Jose and Shawe-Taylor, John and Tirnauca, Cristina},
  volume = 	 {17},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {CIEM, Castro Urdiales, Spain},
  month = 	 {19--21 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v17/hall11a/hall11a.pdf},
  url = 	 {https://proceedings.mlr.press/v17/hall11a.html},
  abstract = 	 {Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real world situations because any occlusion to this small area renders all visual advantage null and void. This is poorby design because it is well known that humans use the complete face to speechread. We demonstrate a new application of a novel visual algorithm, the Multi-Channel Gradient Model, the deploys information from the complete face to perform AVSR. Our MCGM model performs near to the performance of Discrete Cosine Transforms in the case where a small region of interest around the lips, but in the case of an occluded face we can achieve results that match nearly 70% of the performance that DCTs can achieve on the DCT best case, lips centeric approach.}
}

Endnote

%0 Conference Paper
%T Employing The Complete Face in AVSR to Recover from Facial Occlusions
%A Benjamin X. Hall
%A John Shawe-Taylor
%A Alan Johnston
%B Proceedings of the Second Workshop on Applications of Pattern Analysis
%C Proceedings of Machine Learning Research
%D 2011
%E Tom Diethe
%E Jose Balcazar
%E John Shawe-Taylor
%E Cristina Tirnauca	
%F pmlr-v17-hall11a
%I PMLR
%P 33--40
%U https://proceedings.mlr.press/v17/hall11a.html
%V 17
%X Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real world situations because any occlusion to this small area renders all visual advantage null and void. This is poorby design because it is well known that humans use the complete face to speechread. We demonstrate a new application of a novel visual algorithm, the Multi-Channel Gradient Model, the deploys information from the complete face to perform AVSR. Our MCGM model performs near to the performance of Discrete Cosine Transforms in the case where a small region of interest around the lips, but in the case of an occluded face we can achieve results that match nearly 70% of the performance that DCTs can achieve on the DCT best case, lips centeric approach.

RIS

TY  - CPAPER
TI  - Employing The Complete Face in AVSR to Recover from Facial Occlusions
AU  - Benjamin X. Hall
AU  - John Shawe-Taylor
AU  - Alan Johnston
BT  - Proceedings of the Second Workshop on Applications of Pattern Analysis
DA  - 2011/10/21
ED  - Tom Diethe
ED  - Jose Balcazar
ED  - John Shawe-Taylor
ED  - Cristina Tirnauca	
ID  - pmlr-v17-hall11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 17
SP  - 33
EP  - 40
L1  - http://proceedings.mlr.press/v17/hall11a/hall11a.pdf
UR  - https://proceedings.mlr.press/v17/hall11a.html
AB  - Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real world situations because any occlusion to this small area renders all visual advantage null and void. This is poorby design because it is well known that humans use the complete face to speechread. We demonstrate a new application of a novel visual algorithm, the Multi-Channel Gradient Model, the deploys information from the complete face to perform AVSR. Our MCGM model performs near to the performance of Discrete Cosine Transforms in the case where a small region of interest around the lips, but in the case of an occluded face we can achieve results that match nearly 70% of the performance that DCTs can achieve on the DCT best case, lips centeric approach.
ER  -

APA

Hall, B.X., Shawe-Taylor, J. & Johnston, A.. (2011). Employing The Complete Face in AVSR to Recover from Facial Occlusions. Proceedings of the Second Workshop on Applications of Pattern Analysis, in Proceedings of Machine Learning Research 17:33-40 Available from https://proceedings.mlr.press/v17/hall11a.html.

Related Material

Download PDF