Coder Reviewer Reranking for Code Generation

Tianyi Zhang, Tao Yu, Tatsunori Hashimoto, Mike Lewis, Wen-Tau Yih, Daniel Fried, Sida Wang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:41832-41846, 2023.

Abstract

Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zhang23av, title = {Coder Reviewer Reranking for Code Generation}, author = {Zhang, Tianyi and Yu, Tao and Hashimoto, Tatsunori and Lewis, Mike and Yih, Wen-Tau and Fried, Daniel and Wang, Sida}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {41832--41846}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zhang23av/zhang23av.pdf}, url = {https://proceedings.mlr.press/v202/zhang23av.html}, abstract = {Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.} }
Endnote
%0 Conference Paper %T Coder Reviewer Reranking for Code Generation %A Tianyi Zhang %A Tao Yu %A Tatsunori Hashimoto %A Mike Lewis %A Wen-Tau Yih %A Daniel Fried %A Sida Wang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zhang23av %I PMLR %P 41832--41846 %U https://proceedings.mlr.press/v202/zhang23av.html %V 202 %X Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.
APA
Zhang, T., Yu, T., Hashimoto, T., Lewis, M., Yih, W., Fried, D. & Wang, S.. (2023). Coder Reviewer Reranking for Code Generation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:41832-41846 Available from https://proceedings.mlr.press/v202/zhang23av.html.

Related Material