Token-Aware Representation Augmentation for Fine-Grained Semi-Supervised Learning

Hongyang He, Yan Zhong, Xinyuan Song, Daizong Liu, Victor Sanchez
Conference on Parsimony and Learning, PMLR 328:516-528, 2026.

Abstract

FixMatch is a widely adopted semi-supervised learning (SSL) framework that relies on consistency regularization between weakly and strongly augmented versions of unlabeled data. In the case of image classification, its reliance on indiscriminate image-level augmentations often leads to overfitting on early confident predictions while neglecting semantically rich but underexplored features. In this work, we introduce Token-Aware FixMatch (TA-FixMatch), a novel SSL framework that operates at the token representation level to enhance feature diversity and generalization. Specifically, we propose a token-aware masking strategy that identifies and softly suppresses the most influential tokens contributing to high-confidence predictions; and a structured token-level augmentation pipeline that perturbs, reorganizes, and semantically enriches the remaining tokens. These representation-level augmentations guide the model to attend to alternative evidence and discover complementary features, which is particularly beneficial in fine-grained classification tasks. Extensive experiments on standard (CIFAR-100, STL-10) and fine-grained (CUB-200-2011, NABirds, Stanford Cars) benchmarks demonstrate that TA-FixMatch outperforms existing state-of-the-art SSL methods under low-label regimes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-he26b, title = {Token-Aware Representation Augmentation for Fine-Grained Semi-Supervised Learning}, author = {He, Hongyang and Zhong, Yan and Song, Xinyuan and Liu, Daizong and Sanchez, Victor}, booktitle = {Conference on Parsimony and Learning}, pages = {516--528}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/he26b/he26b.pdf}, url = {https://proceedings.mlr.press/v328/he26b.html}, abstract = {FixMatch is a widely adopted semi-supervised learning (SSL) framework that relies on consistency regularization between weakly and strongly augmented versions of unlabeled data. In the case of image classification, its reliance on indiscriminate image-level augmentations often leads to overfitting on early confident predictions while neglecting semantically rich but underexplored features. In this work, we introduce Token-Aware FixMatch (TA-FixMatch), a novel SSL framework that operates at the token representation level to enhance feature diversity and generalization. Specifically, we propose a token-aware masking strategy that identifies and softly suppresses the most influential tokens contributing to high-confidence predictions; and a structured token-level augmentation pipeline that perturbs, reorganizes, and semantically enriches the remaining tokens. These representation-level augmentations guide the model to attend to alternative evidence and discover complementary features, which is particularly beneficial in fine-grained classification tasks. Extensive experiments on standard (CIFAR-100, STL-10) and fine-grained (CUB-200-2011, NABirds, Stanford Cars) benchmarks demonstrate that TA-FixMatch outperforms existing state-of-the-art SSL methods under low-label regimes.} }
Endnote
%0 Conference Paper %T Token-Aware Representation Augmentation for Fine-Grained Semi-Supervised Learning %A Hongyang He %A Yan Zhong %A Xinyuan Song %A Daizong Liu %A Victor Sanchez %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-he26b %I PMLR %P 516--528 %U https://proceedings.mlr.press/v328/he26b.html %V 328 %X FixMatch is a widely adopted semi-supervised learning (SSL) framework that relies on consistency regularization between weakly and strongly augmented versions of unlabeled data. In the case of image classification, its reliance on indiscriminate image-level augmentations often leads to overfitting on early confident predictions while neglecting semantically rich but underexplored features. In this work, we introduce Token-Aware FixMatch (TA-FixMatch), a novel SSL framework that operates at the token representation level to enhance feature diversity and generalization. Specifically, we propose a token-aware masking strategy that identifies and softly suppresses the most influential tokens contributing to high-confidence predictions; and a structured token-level augmentation pipeline that perturbs, reorganizes, and semantically enriches the remaining tokens. These representation-level augmentations guide the model to attend to alternative evidence and discover complementary features, which is particularly beneficial in fine-grained classification tasks. Extensive experiments on standard (CIFAR-100, STL-10) and fine-grained (CUB-200-2011, NABirds, Stanford Cars) benchmarks demonstrate that TA-FixMatch outperforms existing state-of-the-art SSL methods under low-label regimes.
APA
He, H., Zhong, Y., Song, X., Liu, D. & Sanchez, V.. (2026). Token-Aware Representation Augmentation for Fine-Grained Semi-Supervised Learning. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:516-528 Available from https://proceedings.mlr.press/v328/he26b.html.

Related Material