Abstract:
Feature selection is an essential step for analyzing gene expression datasets with very much high dimensions and small number of samples. However, the available feature subset selection algorithms share the common deficiencies that the feature subset is dependent on the training subset, and is various with different training samples. In order to solve this problem in feature selection, a new ensemble feature selection algorithm based on the kernel extreme learning machines is put forward. 5-fold cross validation experiments are adopted to partition the original dataset. For each training subset, 5-fold cross validation experiments are adopted again to partition it, then feature selection process has been done on each sub-training subset, and the union of the five selected feature subsets constructs the feature subset corresponding to the training subset. The classification power of the feature subset is evaluated by the performance of the kernel extreme learning machine built on it. The stability of feature subsets detected by the feature selection algorithms is evaluated by the mean Jaccard coefficient of five feature subsets obtained by 5-fold cross validation experiments on original data. The performance of the proposed ensemble feature selection algorithm is tested on five gene expression datasets. The performance of the proposed feature selection algorithm is compared to the available ones, including SVM-RFE, LLE Score, ARCO, DRJMIM, Random Forest, and mRMR. All the experimental results show that the proposed ensemble feature selection algorithm can not only detect the stable feature subset, but also can select the feature subset with high predictive power.