Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model

Haowen Li, Mou Wang, Zhengding Luo, Ee-Leng Tan, Ziyi Yang, Woon-Seng Gan
Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:38-47, 2026.

Abstract

In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v312-li26a, title = {Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a {CRNN}-based Model}, author = {Li, Haowen and Wang, Mou and Luo, Zhengding and Tan, Ee-Leng and Yang, Ziyi and Gan, Woon-Seng}, booktitle = {Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)}, pages = {38--47}, year = {2026}, editor = {Komatsu, Tatsuya and Imoto, Keisuke and Gao, Xiaoxue and Ono, Nobutaka and Chen, Nancy F.}, volume = {312}, series = {Proceedings of Machine Learning Research}, month = {26 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v312/main/assets/li26a/li26a.pdf}, url = {https://proceedings.mlr.press/v312/li26a.html}, abstract = {In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.} }
Endnote
%0 Conference Paper %T Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model %A Haowen Li %A Mou Wang %A Zhengding Luo %A Ee-Leng Tan %A Ziyi Yang %A Woon-Seng Gan %B Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI) %C Proceedings of Machine Learning Research %D 2026 %E Tatsuya Komatsu %E Keisuke Imoto %E Xiaoxue Gao %E Nobutaka Ono %E Nancy F. Chen %F pmlr-v312-li26a %I PMLR %P 38--47 %U https://proceedings.mlr.press/v312/li26a.html %V 312 %X In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.
APA
Li, H., Wang, M., Luo, Z., Tan, E., Yang, Z. & Gan, W.. (2026). Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model. Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), in Proceedings of Machine Learning Research 312:38-47 Available from https://proceedings.mlr.press/v312/li26a.html.

Related Material