These include an enhanced CHI square with mutual information (MI) algorithm, which simultaneously introduces word frequency and term adjustment (CHMI); a term frequency–CHI square (TF–CHI) algorithm, which enhances weight calculation; and a term frequency–inverse document frequency (TF–IDF) algorithm enhanced with the extreme gradient boosting (XGBoost) algorithm, which improves the algorithm’s ability of word filtering (TF–XGBoost).This study randomly chooses 3000 texts from six different categories of the Sogou news corpus to obtain the confusion matrix and evaluate the performance of the new algorithms Rug with precision and the
The experimental results demonstrate that the feature selection algorithms proposed in this paper improve performance across various news corpora, although the best feature selection schemes for each type of corpus are different.Further studies of the application of the improved feature selection methods in other languages and the improvement in classifiers are suggested.