Balanced Mixture of Supernets for Learning the CNN Pooling Architecture

Mehraveh Javan Roshtkhari, Matthew Toews, Marco Pedersoli
Proceedings of the Second International Conference on Automated Machine Learning, PMLR 224:8/1-23, 2023.

Abstract

Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal. Network Architecture Search (NAS) might be used to optimize downsampling configurations as an hyperparameter. However, we find that common one-shot NAS based on a single SuperNet do not work for this problem. We argue that this is because a SuperNet trained for finding the optimal pooling configuration fully shares its parameters among all pooling configurations. This makes its training hard because learning some configurations can harm the performance of others. Therefore, we propose a balanced mixture of SuperNets that automatically associates pooling configurations to different weight models and helps to reduce the weight-sharing and interinfluence of pooling configurations on the SuperNet parameters. We evaluate our proposed approach on CIFAR10, CIFAR100, as well as Food101, and show that in all cases our model outperforms other approaches and improves over the default pooling configurations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v224-roshtkhari23a, title = {Balanced Mixture of Supernets for Learning the CNN Pooling Architecture}, author = {Roshtkhari, Mehraveh Javan and Toews, Matthew and Pedersoli, Marco}, booktitle = {Proceedings of the Second International Conference on Automated Machine Learning}, pages = {8/1--23}, year = {2023}, editor = {Faust, Aleksandra and Garnett, Roman and White, Colin and Hutter, Frank and Gardner, Jacob R.}, volume = {224}, series = {Proceedings of Machine Learning Research}, month = {12--15 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v224/roshtkhari23a/roshtkhari23a.pdf}, url = {https://proceedings.mlr.press/v224/roshtkhari23a.html}, abstract = {Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal. Network Architecture Search (NAS) might be used to optimize downsampling configurations as an hyperparameter. However, we find that common one-shot NAS based on a single SuperNet do not work for this problem. We argue that this is because a SuperNet trained for finding the optimal pooling configuration fully shares its parameters among all pooling configurations. This makes its training hard because learning some configurations can harm the performance of others. Therefore, we propose a balanced mixture of SuperNets that automatically associates pooling configurations to different weight models and helps to reduce the weight-sharing and interinfluence of pooling configurations on the SuperNet parameters. We evaluate our proposed approach on CIFAR10, CIFAR100, as well as Food101, and show that in all cases our model outperforms other approaches and improves over the default pooling configurations.} }
Endnote
%0 Conference Paper %T Balanced Mixture of Supernets for Learning the CNN Pooling Architecture %A Mehraveh Javan Roshtkhari %A Matthew Toews %A Marco Pedersoli %B Proceedings of the Second International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Aleksandra Faust %E Roman Garnett %E Colin White %E Frank Hutter %E Jacob R. Gardner %F pmlr-v224-roshtkhari23a %I PMLR %P 8/1--23 %U https://proceedings.mlr.press/v224/roshtkhari23a.html %V 224 %X Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal. Network Architecture Search (NAS) might be used to optimize downsampling configurations as an hyperparameter. However, we find that common one-shot NAS based on a single SuperNet do not work for this problem. We argue that this is because a SuperNet trained for finding the optimal pooling configuration fully shares its parameters among all pooling configurations. This makes its training hard because learning some configurations can harm the performance of others. Therefore, we propose a balanced mixture of SuperNets that automatically associates pooling configurations to different weight models and helps to reduce the weight-sharing and interinfluence of pooling configurations on the SuperNet parameters. We evaluate our proposed approach on CIFAR10, CIFAR100, as well as Food101, and show that in all cases our model outperforms other approaches and improves over the default pooling configurations.
APA
Roshtkhari, M.J., Toews, M. & Pedersoli, M.. (2023). Balanced Mixture of Supernets for Learning the CNN Pooling Architecture. Proceedings of the Second International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 224:8/1-23 Available from https://proceedings.mlr.press/v224/roshtkhari23a.html.

Related Material