Small Sample Patents Classification Task Based on Mengzi-BERT-base Single Model

Hao Huang; Yi Liu

Small Sample Patents Classification Task Based on Mengzi-BERT-base Single Model

Hao Huang, Yi Liu

Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:111-118, 2025.

Abstract

Small sample data classification faces challenges such as data scarcity, overfitting risks, and feature representation learning. In order to tackle these challenges, the present study proposes a transfer learning methodology that leverages the insights gained from extensive datasets or pre-trained models to enhance the model’s capacity for generalization. Furthermore, meta-learning methodologies facilitate the rapid adaptation of models to novel tasks using a limited number of samples by employing strategies that enhance the learning process itself. Concurrently, data augmentation techniques enhance both the diversity and volume of samples through the synthesis, expansion, or transformation of small datasets, thereby augmenting the model’s generalization capabilities. The paper also presents an active learning method that uses the uncertainty and information gain of the model to automatically select the most valuable samples for labeling to optimize the training effect of the model. It solves the problem of obtaining large-scale annotated data in many practical scenarios, and provides efficient classification and analysis of small amounts of annotated data. Moreover, it serves as the basis of zero-sample learning, which has important knowledge transfer and application value. The paper concludes by showing that the proposed approach outperforms existing methods on a benchmark dataset, demonstrating its effectiveness in addressing the challenges of small sample data classification.

Cite this Paper

BibTeX

@InProceedings{pmlr-v278-huang25a,
  title = 	 {Small Sample Patents Classification Task Based on Mengzi-BERT-base Single Model},
  author =       {Huang, Hao and Liu, Yi},
  booktitle = 	 {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing},
  pages = 	 {111--118},
  year = 	 {2025},
  editor = 	 {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu},
  volume = 	 {278},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v278/main/assets/huang25a/huang25a.pdf},
  url = 	 {https://proceedings.mlr.press/v278/huang25a.html},
  abstract = 	 {Small sample data classification faces challenges such as data scarcity, overfitting risks, and feature representation learning. In order to tackle these challenges, the present study proposes a transfer learning methodology that leverages the insights gained from extensive datasets or pre-trained models to enhance the model’s capacity for generalization. Furthermore, meta-learning methodologies facilitate the rapid adaptation of models to novel tasks using a limited number of samples by employing strategies that enhance the learning process itself. Concurrently, data augmentation techniques enhance both the diversity and volume of samples through the synthesis, expansion, or transformation of small datasets, thereby augmenting the model’s generalization capabilities. The paper also presents an active learning method that uses the uncertainty and information gain of the model to automatically select the most valuable samples for labeling to optimize the training effect of the model. It solves the problem of obtaining large-scale annotated data in many practical scenarios, and provides efficient classification and analysis of small amounts of annotated data. Moreover, it serves as the basis of zero-sample learning, which has important knowledge transfer and application value. The paper concludes by showing that the proposed approach outperforms existing methods on a benchmark dataset, demonstrating its effectiveness in addressing the challenges of small sample data classification.}
}

Endnote

%0 Conference Paper
%T Small Sample Patents Classification Task Based on Mengzi-BERT-base Single Model
%A Hao Huang
%A Yi Liu
%B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing
%C Proceedings of Machine Learning Research
%D 2025
%E Nianyin Zeng
%E Ram Bilas Pachori
%E Dongshu Wang	
%F pmlr-v278-huang25a
%I PMLR
%P 111--118
%U https://proceedings.mlr.press/v278/huang25a.html
%V 278
%X Small sample data classification faces challenges such as data scarcity, overfitting risks, and feature representation learning. In order to tackle these challenges, the present study proposes a transfer learning methodology that leverages the insights gained from extensive datasets or pre-trained models to enhance the model’s capacity for generalization. Furthermore, meta-learning methodologies facilitate the rapid adaptation of models to novel tasks using a limited number of samples by employing strategies that enhance the learning process itself. Concurrently, data augmentation techniques enhance both the diversity and volume of samples through the synthesis, expansion, or transformation of small datasets, thereby augmenting the model’s generalization capabilities. The paper also presents an active learning method that uses the uncertainty and information gain of the model to automatically select the most valuable samples for labeling to optimize the training effect of the model. It solves the problem of obtaining large-scale annotated data in many practical scenarios, and provides efficient classification and analysis of small amounts of annotated data. Moreover, it serves as the basis of zero-sample learning, which has important knowledge transfer and application value. The paper concludes by showing that the proposed approach outperforms existing methods on a benchmark dataset, demonstrating its effectiveness in addressing the challenges of small sample data classification.

APA

Huang, H. & Liu, Y.. (2025). Small Sample Patents Classification Task Based on Mengzi-BERT-base Single Model. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:111-118 Available from https://proceedings.mlr.press/v278/huang25a.html.

Related Material

Download PDF