Event-Based Binary Neural Networks for Efficient and Accurate Lip Reading

Xueyi Zhang, Jialu Sun, Peiyin Zhu, Bowen Wang, Mingrui Lao, Yanming Guo
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:212-221, 2025.

Abstract

Event cameras provide exceptional temporal resolution and consume minimal power, making them highly suitable for lip reading tasks. However, traditional methods struggle with the high computational costs of processing asynchronous event streams. We propose STCNet, a spatio-temporal convolutional network optimized for event-driven lip reading, and its binary counterpart B-STCNet,which results in a substantial reduction in computational and memory resource requirements. B-STCNet introduces Kernel-Specific Scaling Factors to bridge the performance gap induced by binarization and adopts quantization-aware training to enhance model stability. Evaluated on the DVS-Lip dataset, B-STCNet achieves state-of-the-art accuracy with over 90% reduction in parameters and 50% fewer FLOPs, demonstrating its potential for deployment on resource-constrained edge devices.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-zhang25c, title = {Event-Based Binary Neural Networks for Efficient and Accurate Lip Reading}, author = {Zhang, Xueyi and Sun, Jialu and Zhu, Peiyin and Wang, Bowen and Lao, Mingrui and Guo, Yanming}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {212--221}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/zhang25c/zhang25c.pdf}, url = {https://proceedings.mlr.press/v278/zhang25c.html}, abstract = {Event cameras provide exceptional temporal resolution and consume minimal power, making them highly suitable for lip reading tasks. However, traditional methods struggle with the high computational costs of processing asynchronous event streams. We propose STCNet, a spatio-temporal convolutional network optimized for event-driven lip reading, and its binary counterpart B-STCNet,which results in a substantial reduction in computational and memory resource requirements. B-STCNet introduces Kernel-Specific Scaling Factors to bridge the performance gap induced by binarization and adopts quantization-aware training to enhance model stability. Evaluated on the DVS-Lip dataset, B-STCNet achieves state-of-the-art accuracy with over 90% reduction in parameters and 50% fewer FLOPs, demonstrating its potential for deployment on resource-constrained edge devices.} }
Endnote
%0 Conference Paper %T Event-Based Binary Neural Networks for Efficient and Accurate Lip Reading %A Xueyi Zhang %A Jialu Sun %A Peiyin Zhu %A Bowen Wang %A Mingrui Lao %A Yanming Guo %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-zhang25c %I PMLR %P 212--221 %U https://proceedings.mlr.press/v278/zhang25c.html %V 278 %X Event cameras provide exceptional temporal resolution and consume minimal power, making them highly suitable for lip reading tasks. However, traditional methods struggle with the high computational costs of processing asynchronous event streams. We propose STCNet, a spatio-temporal convolutional network optimized for event-driven lip reading, and its binary counterpart B-STCNet,which results in a substantial reduction in computational and memory resource requirements. B-STCNet introduces Kernel-Specific Scaling Factors to bridge the performance gap induced by binarization and adopts quantization-aware training to enhance model stability. Evaluated on the DVS-Lip dataset, B-STCNet achieves state-of-the-art accuracy with over 90% reduction in parameters and 50% fewer FLOPs, demonstrating its potential for deployment on resource-constrained edge devices.
APA
Zhang, X., Sun, J., Zhu, P., Wang, B., Lao, M. & Guo, Y.. (2025). Event-Based Binary Neural Networks for Efficient and Accurate Lip Reading. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:212-221 Available from https://proceedings.mlr.press/v278/zhang25c.html.

Related Material