To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:41604-41643, 2024.

Abstract

The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: “ Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs?" In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with robust losses underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-qiu24c, title = {To Cool or not to Cool? {T}emperature Network Meets Large Foundation Models via {DRO}}, author = {Qiu, Zi-Hao and Guo, Siqi and Xu, Mao and Zhao, Tuo and Zhang, Lijun and Yang, Tianbao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {41604--41643}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/qiu24c/qiu24c.pdf}, url = {https://proceedings.mlr.press/v235/qiu24c.html}, abstract = {The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: “ Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs?" In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with robust losses underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models.} }
Endnote
%0 Conference Paper %T To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO %A Zi-Hao Qiu %A Siqi Guo %A Mao Xu %A Tuo Zhao %A Lijun Zhang %A Tianbao Yang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-qiu24c %I PMLR %P 41604--41643 %U https://proceedings.mlr.press/v235/qiu24c.html %V 235 %X The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: “ Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs?" In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with robust losses underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models.
APA
Qiu, Z., Guo, S., Xu, M., Zhao, T., Zhang, L. & Yang, T.. (2024). To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:41604-41643 Available from https://proceedings.mlr.press/v235/qiu24c.html.

Related Material