Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model

Haowen Li; Mou Wang; Zhengding Luo; Ee-Leng Tan; Ziyi Yang; Woon-Seng Gan

Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model

Haowen Li, Mou Wang, Zhengding Luo, Ee-Leng Tan, Ziyi Yang, Woon-Seng Gan

Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:38-47, 2026.

Abstract

In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v312-li26a,
  title = 	 {Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a {CRNN}-based Model},
  author =       {Li, Haowen and Wang, Mou and Luo, Zhengding and Tan, Ee-Leng and Yang, Ziyi and Gan, Woon-Seng},
  booktitle = 	 {Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)},
  pages = 	 {38--47},
  year = 	 {2026},
  editor = 	 {Komatsu, Tatsuya and Imoto, Keisuke and Gao, Xiaoxue and Ono, Nobutaka and Chen, Nancy F.},
  volume = 	 {312},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v312/main/assets/li26a/li26a.pdf},
  url = 	 {https://proceedings.mlr.press/v312/li26a.html},
  abstract = 	 {In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.}
}

Endnote

%0 Conference Paper
%T Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model
%A Haowen Li
%A Mou Wang
%A Zhengding Luo
%A Ee-Leng Tan
%A Ziyi Yang
%A Woon-Seng Gan
%B Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI)
%C Proceedings of Machine Learning Research
%D 2026
%E Tatsuya Komatsu
%E Keisuke Imoto
%E Xiaoxue Gao
%E Nobutaka Ono
%E Nancy F. Chen	
%F pmlr-v312-li26a
%I PMLR
%P 38--47
%U https://proceedings.mlr.press/v312/li26a.html
%V 312
%X In this work, we present MobileASCNet, a lightweight CRNN-based model designed for acoustic scene classification (ASC) under spatial-temporal variability, as defined in the APSIPA ASC 2025 Grand Challenge. The model combines depthwise separable convolutions and ResNet-inspired residual blocks for efficient spatial feature extraction, and employs a gated recurrent unit (GRU) branch to capture temporal dependencies. City and time embeddings are fused to enhance context-awareness. We conduct extensive comparisons under different training strategies, including training from scratch, pretraining with fine-tuning, and feature freezing. Without relying on knowledge distillation, MobileASCNet achieves a competitive classification accuracy on the development set, with low model complexity.

APA

Li, H., Wang, M., Luo, Z., Tan, E., Yang, Z. & Gan, W.. (2026). Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model. Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), in Proceedings of Machine Learning Research 312:38-47 Available from https://proceedings.mlr.press/v312/li26a.html.

Related Material

Download PDF