Research on gesture recognition based on YOLOv8

Yang Yang
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:189-196, 2025.

Abstract

Recognizing gestures quickly and accurately has always been a research topic that has attracted much attention. However, existing gesture recognition algorithms still face two challenges. The computational complexity and parameters of gesture recognition deep learning models are often numerous, making them difficult to deploy on resource-limited embedded devices. Secondly, the deep learning model for dynamic gesture recognition is still insufficient in its ability to extract location spatial features. To solve the above problems, this paper proposes a gesture recognition algorithm based on an attention mechanism. First, You Only Look Once (YOLO) v8n lightweight object detection algorithm was selected to reduce parameters and calculations. Furthermore, the Multi-Head Self-Attention (MHSA) model was integrated into the YOLOv8n network to enhance the feature extraction capabilities from the position and spatial dimensions. Experimental results demonstrated that the proposed algorithm achieved 99.2% accuracy, surpassing by 1.1% compared to the original algorithm. Furthermore, it had a 233 FPS detection speed on the Nvidia RTX 3070 GPU.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-yang25a, title = {Research on gesture recognition based on YOLOv8}, author = {Yang, Yang}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {189--196}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/yang25a/yang25a.pdf}, url = {https://proceedings.mlr.press/v278/yang25a.html}, abstract = {Recognizing gestures quickly and accurately has always been a research topic that has attracted much attention. However, existing gesture recognition algorithms still face two challenges. The computational complexity and parameters of gesture recognition deep learning models are often numerous, making them difficult to deploy on resource-limited embedded devices. Secondly, the deep learning model for dynamic gesture recognition is still insufficient in its ability to extract location spatial features. To solve the above problems, this paper proposes a gesture recognition algorithm based on an attention mechanism. First, You Only Look Once (YOLO) v8n lightweight object detection algorithm was selected to reduce parameters and calculations. Furthermore, the Multi-Head Self-Attention (MHSA) model was integrated into the YOLOv8n network to enhance the feature extraction capabilities from the position and spatial dimensions. Experimental results demonstrated that the proposed algorithm achieved 99.2% accuracy, surpassing by 1.1% compared to the original algorithm. Furthermore, it had a 233 FPS detection speed on the Nvidia RTX 3070 GPU.} }
Endnote
%0 Conference Paper %T Research on gesture recognition based on YOLOv8 %A Yang Yang %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-yang25a %I PMLR %P 189--196 %U https://proceedings.mlr.press/v278/yang25a.html %V 278 %X Recognizing gestures quickly and accurately has always been a research topic that has attracted much attention. However, existing gesture recognition algorithms still face two challenges. The computational complexity and parameters of gesture recognition deep learning models are often numerous, making them difficult to deploy on resource-limited embedded devices. Secondly, the deep learning model for dynamic gesture recognition is still insufficient in its ability to extract location spatial features. To solve the above problems, this paper proposes a gesture recognition algorithm based on an attention mechanism. First, You Only Look Once (YOLO) v8n lightweight object detection algorithm was selected to reduce parameters and calculations. Furthermore, the Multi-Head Self-Attention (MHSA) model was integrated into the YOLOv8n network to enhance the feature extraction capabilities from the position and spatial dimensions. Experimental results demonstrated that the proposed algorithm achieved 99.2% accuracy, surpassing by 1.1% compared to the original algorithm. Furthermore, it had a 233 FPS detection speed on the Nvidia RTX 3070 GPU.
APA
Yang, Y.. (2025). Research on gesture recognition based on YOLOv8. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:189-196 Available from https://proceedings.mlr.press/v278/yang25a.html.

Related Material