MENG Xiaochao1, JIANG Gaoxia1, WANG Wenjian2*
(1 School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China;2 Key Laboratory of Computation Intelligence and Chinese Information Processing,Ministry of Education(Shanxi University), Taiyuan 030006, Shanxi, China)
Abstract:
In supervised classification learning, the impact of label noise on the model is often more important. The existing label noise filtering methods generally detect and remove noise samples based on the prediction results of the model. When the number of noise samples is large, removing the noise samples will affect the integrity of the original samples and make the sample information missing. Aiming at this problem, a method of label noise cleaning based on active learning is proposed, namely GP_ALNC(active label noise cleaning based on classification with Gaussian process). This method combines Gaussian process model and active learning to select the most uncertain samples from existing labeled sample sets and outsourcing them to artificial experts for examining. The proposed iterative method can clean away most of the noise data while maintaining the integrity of the original data. For the label noise problem in the two-class task, the proposed method is compared with the existing methods ALNR(active label noise removal) and ICCN_SMO(iterative correction of class noise based on SMO) on the MNIST and UCI data sets. The experiment results show that the proposed GP_ALNC may achieve good performance.
KeyWords:
label noise; noise cleaning; Gaussian process; active learning