[edit]
Event-Based Binary Neural Networks for Efficient and Accurate Lip Reading
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:212-221, 2025.
Abstract
Event cameras provide exceptional temporal resolution and consume minimal power, making them highly suitable for lip reading tasks. However, traditional methods struggle with the high computational costs of processing asynchronous event streams. We propose STCNet, a spatio-temporal convolutional network optimized for event-driven lip reading, and its binary counterpart B-STCNet,which results in a substantial reduction in computational and memory resource requirements. B-STCNet introduces Kernel-Specific Scaling Factors to bridge the performance gap induced by binarization and adopts quantization-aware training to enhance model stability. Evaluated on the DVS-Lip dataset, B-STCNet achieves state-of-the-art accuracy with over 90% reduction in parameters and 50% fewer FLOPs, demonstrating its potential for deployment on resource-constrained edge devices.