Proceedings of Machine Learning Research

Volume 312: Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), 26 January 2026, Singapore EXPO, Singapore, Singapore

Editors: Tatsuya Komatsu, Keisuke Imoto, Xiaoxue Gao, Nobutaka Ono, Nancy F. Chen

Filter Authors: Filter Titles:

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis

Théodor Lemerle, Téo Guichoux, Axel Roebel, Nicolas Obin; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:1-20

[abs][Download PDF]

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio Embedding Sequences

Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:21-37

[abs][Download PDF]

Semi-supervised Acoustic Scene Classification under Spatial-Temporal Variability with a CRNN-based Model

Haowen Li, Mou Wang, Zhengding Luo, Ee-Leng Tan, Ziyi Yang, Woon-Seng Gan; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:38-47

[abs][Download PDF]

Online Independent Low-Rank Matrix Analysis as a Lightweight and Trainable Model for Real-Time Multichannel Music Source Separation

Taishi Nakashima, Nobutaka Ono; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:48-60

[abs][Download PDF]

Train multi-modal LLM to understand diverse speech paralinguistics by distilling from teacher with meta-information prompt

Jeremy Wong, Muhammad Huzaifah, Hardik Sailor, Shuo Sun, Kye Min Tan, Bin Wang, Qiongqiong Wang, Wenyu Zhang, Xunlong Zou, Nancy F. Chen, Ai Ti Aw; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:61-77

[abs][Download PDF]

Latent-RQ: Enhancing Speech Pre-training with Latent Representations and Random Quantization

Muhammad Huzaifah, Hardik Sailor, Jeremy Wong, Nancy F. Chen, Ai Ti Aw; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:78-93

[abs][Download PDF]

Can You Hear Naples? Building and Benchmarking a Neapolitan Speech Corpus

Michael Cacioli, Liam Eggleston, Jatin Sarabu, Kevin Zhu; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:94-112

[abs][Download PDF]

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Jingru Lin, Chen Zhang, Tianrui Wang, Haizhou Li; Proceedings of the AAAI 2026 Workshop on Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (Audio-AAAI), PMLR 312:113-125

[abs][Download PDF]