Proceedings of Machine Learning Research

Volume 262: NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada

Editors: Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen

[bib][citeproc]

Contents:

Training
Model Design \& Architecture
Model Efficiency \& Compression
Inference
Benchmark \& Evaluation
Applications

Filter Authors: Filter Titles:

Training

Scaling Smart: Accelerating Large Language Model Pre-Training with Small Model Initialization

Mohammad Samragh, Seyed Iman Mirzadeh, Keivan Alizadeh-Vahid, Fartash Faghri, Minsik Cho, Moin Nabi, Devang Naik, Mehrdad Farajtabar; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:1-13

[abs][Download PDF]

Computational Bottlenecks of Training Small-scale Large Language Models

Saleh Ashkboos, Seyed Iman Mirzadeh, Keivan Alizadeh-Vahid, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:14-21

[abs][Download PDF]

QuAILoRA: Quantization-Aware Initialization for LoRA

Neal G Lawton, Aishwarya Padmakumar, Judith Gaspers, Jack FitzGerald, Anoop Kumar, Greg Ver Steeg, Aram Galstyan; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:22-33

[abs][Download PDF]

SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings

Mohammad Ali Sadraei Javaheri, Ehsaneddin Asgari, Alice C. McHardy, Hamid R. Rabiee; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:34-46

[abs][Download PDF]

RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection

Ali Saheb Pasand, Pouya Bashivan; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:47-54

[abs][Download PDF]

Efficient Alignment of Large Language Models via Data Sampling

Amrit Khera, Rajat Ghosh, Debojyoti Dutta; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:55-72

[abs][Download PDF]

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Rambod Azimi, Rishav Rishav, Marek Teichmann, Samira Ebrahimi Kahou; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:73-80

[abs][Download PDF]

Model Design \& Architecture

Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts

Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Therien, Sambit Sahu, Stephen Rawls, Supriyo Chakraborty, Tom Goldstein; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:81-101

[abs][Download PDF]

VL-Mamba: Exploring State Space Models for Multimodal Learning

Yanyuan Qiao, Zheng Yu, Zijia Zhao, Sihan Chen, Mingzhen Sun, Longteng Guo, Qi Wu, Jing Liu; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:102-113

[abs][Download PDF]

MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection

Moyang Liu, Kaiying Yan, Yukun Liu, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Chenxing Li; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:114-122

[abs][Download PDF]

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:123-135

[abs][Download PDF]

Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?

Habib Hajimolahoseini, Walid Ahmed, Shuangyue Wen, Yang Liu; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:136-144

[abs][Download PDF]

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

Youngseog Chung, Dhruv Malik, Jeff Schneider, Yuanzhi Li, Aarti Singh; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:145-164

[abs][Download PDF]

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

Soumajyoti Sarkar, Leonard Lausen, Volkan Cevher, Thomas Brox, Sheng Zha, George Karypis; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:165-181

[abs][Download PDF]

StructMoE: Structured Mixture of Experts Using Low Rank Experts

Zain Sarwar, Ashwinee Panda, Benjamin Thérien, Stephen Rawls, Anirban Das, Kartik Balasubramaniam, Berkcan Kapusuzoglu, Shixiong Zhang, Sambit Sahu, Milind Naphade, Supriyo Chakraborty; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:182-193

[abs][Download PDF]

Sparse Upcycling: Inference Inefficient Finetuning

Sasha Doubov, Nikhil Sardana, Vitaliy Chiley; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:194-205

[abs][Download PDF]

Model Efficiency \& Compression

Post-Training Statistical Calibration for Higher Activation Sparsity

Vui Seng Chua, Yujie Pan, Nilesh Jain, Vui Seng Chua; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:206-221

[abs][Download PDF]

Accelerating the Low-Rank Decomposed Models

Habib Hajimolahoseini, Walid Ahmed, Shuangyue Wen, Yang Liu; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:222-231

[abs][Download PDF]

The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence

Adithya G Vasudev; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:232-240

[abs][Download PDF]

Post Training Quantization of Large Language Models with Microscaling Formats

Sayeh Sharify, Utkarsh Saxena, Zifei Xu, Wanzin Yazar, Ilya Soloveychik, Xin Wang; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:241-258

[abs][Download PDF]

EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models

Hossein Rajabzadeh, Aref Jafari, Aman Sharma, Benyamin Jami, Hyock Ju Hj Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:259-269

[abs][Download PDF]

Scaling laws for post-training quantized large language models

Zifei Xu, Alexander Y Lan, Wanzin Yazar, Tristan Webb, Sayeh Sharify, Xin Wang; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:270-285

[abs][Download PDF]

Partially Shared Query-Key for Lightweight Language Models

Kai Yang, Vahid Partovi Nia, Boxing Chen, Masoud Asgharian; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:286-291

[abs][Download PDF]

Inference

Snakes and Ladders: Accelerating SSM Inference with Speculative Decoding

Yangchao Wu, Yonatan Dukler, Matthew Trager, Alessandro Achille, Wei Xia, Stefano Soatto; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:292-304

[abs][Download PDF]

GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference

Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:305-321

[abs][Download PDF]

The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation

Lawrence Stewart, Matthew Trager, Sujan Gonugondla, Stefano Soatto; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:322-335

[abs][Download PDF]

Distributed Speculative Inference of Large Language Models is Provably Faster

Nadav Timor, Jonathan Mamou, Oren Pereg, Moshe Berchansky, Daniel Korat, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:336-354

[abs][Download PDF]

AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability

Sudhanshu Agrawal, Wonseok Jeon, Mingu Lee; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:355-369

[abs][Download PDF]

Inference-Friendly Models With MixAttention

Shashank Rajput, Ying Sheng, Sean Owen, Vitaliy Chiley; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:370-381

[abs][Download PDF]

Improving Multi-candidate Speculative Decoding

XiaoFan Lu, Yixiao Zeng, Marco Levorato, FeiYang Ma, ZiXu Yu; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:382-394

[abs][Download PDF]

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:395-413

[abs][Download PDF]

Hysteresis Activation Function for Efficient Inference

Moshe Kimhi, Idan Kashani, Chaim Baskin, Avi Mendelson; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:414-422

[abs][Download PDF]

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks

Agniv Sharma, Jonas A. Geiping; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:423-442

[abs][Download PDF]

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Keivan Alizadeh-Vahid, Seyed Iman Mirzadeh, Hooman Shahrkokhi, Dmitry Belenko, Frank Sun, Minsik Cho, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:443-455

[abs][Download PDF]

Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models

Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:456-467

[abs][Download PDF]

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao Dai, Yu Wang; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:468-484

[abs][Download PDF]

Residual vector quantization for KV cache compression in large language model

Ankur Kumar; Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:485-490

[abs][Download PDF]

Volume 262: NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada

Training

Model Design \& Architecture

Model Efficiency \& Compression

Inference

Benchmark \& Evaluation

Applications