[edit]
Volume 262: NeurIPS Efficient Natural Language and Speech Processing Workshop, 14 December 2024, Vancouver, British Columbia, Canada
[edit]
Editors: Mehdi Rezagholizadeh, Peyman Passban, Soheila Samiee, Vahid Partovi Nia, Yu Cheng, Yue Deng, Qun Liu, Boxing Chen
Training
Scaling Smart: Accelerating Large Language Model Pre-Training with Small Model Initialization
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:1-13
;[abs][Download PDF]
Computational Bottlenecks of Training Small-scale Large Language Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:14-21
;[abs][Download PDF]
QuAILoRA: Quantization-Aware Initialization for LoRA
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:22-33
;[abs][Download PDF]
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:34-46
;[abs][Download PDF]
RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:47-54
;[abs][Download PDF]
Efficient Alignment of Large Language Models via Data Sampling
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:55-72
;[abs][Download PDF]
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:73-80
;[abs][Download PDF]
Model Design \& Architecture
Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:81-101
;[abs][Download PDF]
VL-Mamba: Exploring State Space Models for Multimodal Learning
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:102-113
;[abs][Download PDF]
MisD-MoE: A Multimodal Misinformation Detection Framework with Adaptive Feature Selection
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:114-122
;[abs][Download PDF]
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:123-135
;[abs][Download PDF]
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:136-144
;[abs][Download PDF]
Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:145-164
;[abs][Download PDF]
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:165-181
;[abs][Download PDF]
StructMoE: Structured Mixture of Experts Using Low Rank Experts
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:182-193
;[abs][Download PDF]
Sparse Upcycling: Inference Inefficient Finetuning
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:194-205
;[abs][Download PDF]
Model Efficiency \& Compression
Post-Training Statistical Calibration for Higher Activation Sparsity
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:206-221
;[abs][Download PDF]
Accelerating the Low-Rank Decomposed Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:222-231
;[abs][Download PDF]
The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:232-240
;[abs][Download PDF]
Post Training Quantization of Large Language Models with Microscaling Formats
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:241-258
;[abs][Download PDF]
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:259-269
;[abs][Download PDF]
Scaling laws for post-training quantized large language models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:270-285
;[abs][Download PDF]
Partially Shared Query-Key for Lightweight Language Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:286-291
;[abs][Download PDF]
Inference
Snakes and Ladders: Accelerating SSM Inference with Speculative Decoding
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:292-304
;[abs][Download PDF]
GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:305-321
;[abs][Download PDF]
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:322-335
;[abs][Download PDF]
Distributed Speculative Inference of Large Language Models is Provably Faster
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:336-354
;[abs][Download PDF]
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:355-369
;[abs][Download PDF]
Inference-Friendly Models With MixAttention
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:370-381
;[abs][Download PDF]
Improving Multi-candidate Speculative Decoding
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:382-394
;[abs][Download PDF]
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:395-413
;[abs][Download PDF]
Hysteresis Activation Function for Efficient Inference
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:414-422
;[abs][Download PDF]
Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:423-442
;[abs][Download PDF]
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:443-455
;[abs][Download PDF]
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:456-467
;[abs][Download PDF]
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:468-484
;[abs][Download PDF]
Residual vector quantization for KV cache compression in large language model
Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop, PMLR 262:485-490
;[abs][Download PDF]
Benchmark \& Evaluation
Applications
subscribe via RSS