PMID- 33267482 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20240329 IS - 1099-4300 (Electronic) IS - 1099-4300 (Linking) VI - 21 IP - 8 DP - 2019 Aug 7 TI - Beta Distribution-Based Cross-Entropy for Feature Selection. LID - 10.3390/e21080769 [doi] LID - 769 AB - Analysis of high-dimensional data is a challenge in machine learning and data mining. Feature selection plays an important role in dealing with high-dimensional data for improvement of predictive accuracy, as well as better interpretation of the data. Frequently used evaluation functions for feature selection include resampling methods such as cross-validation, which show an advantage in predictive accuracy. However, these conventional methods are not only computationally expensive, but also tend to be over-optimistic. We propose a novel cross-entropy which is based on beta distribution for feature selection. In beta distribution-based cross-entropy (BetaDCE) for feature selection, the probability density is estimated by beta distribution and the cross-entropy is computed by the expected value of beta distribution, so that the generalization ability can be estimated more precisely than conventional methods where the probability density is learnt from data. Analysis of the generalization ability of BetaDCE revealed that it was a trade-off between bias and variance. The robustness of BetaDCE was demonstrated by experiments on three types of data. In the exclusive or-like (XOR-like) dataset, the false discovery rate of BetaDCE was significantly smaller than that of other methods. For the leukemia dataset, the area under the curve (AUC) of BetaDCE on the test set was 0.93 with only four selected features, which indicated that BetaDCE not only detected the irrelevant and redundant features precisely, but also more accurately predicted the class labels with a smaller number of features than the original method, whose AUC was 0.83 with 50 features. In the metabonomic dataset, the overall AUC of prediction with features selected by BetaDCE was significantly larger than that by the original reported method. Therefore, BetaDCE can be used as a general and efficient framework for feature selection. FAU - Dai, Weixing AU - Dai W AUID- ORCID: 0000-0002-5395-8568 AD - School of Life Science and State Key Laboratory of Agrobiotechnology, G94, Science Center South Block, The Chinese University of Hong Kong, Shatin 999077, Hong Kong, China. FAU - Guo, Dianjing AU - Guo D AD - School of Life Science and State Key Laboratory of Agrobiotechnology, G94, Science Center South Block, The Chinese University of Hong Kong, Shatin 999077, Hong Kong, China. LA - eng GR - 8300052/Innovation Technology Fund of Innovation Technology Commission/ PT - Journal Article DEP - 20190807 PL - Switzerland TA - Entropy (Basel) JT - Entropy (Basel, Switzerland) JID - 101243874 PMC - PMC7515297 OTO - NOTNLM OT - beta distribution OT - cross-entropy OT - data mining OT - feature selection OT - machine learning COIS- The authors declare no conflict of interest. EDAT- 2019/08/07 00:00 MHDA- 2019/08/07 00:01 PMCR- 2019/08/07 CRDT- 2020/12/03 01:09 PHST- 2019/06/15 00:00 [received] PHST- 2019/07/30 00:00 [revised] PHST- 2019/08/05 00:00 [accepted] PHST- 2020/12/03 01:09 [entrez] PHST- 2019/08/07 00:00 [pubmed] PHST- 2019/08/07 00:01 [medline] PHST- 2019/08/07 00:00 [pmc-release] AID - e21080769 [pii] AID - entropy-21-00769 [pii] AID - 10.3390/e21080769 [doi] PST - epublish SO - Entropy (Basel). 2019 Aug 7;21(8):769. doi: 10.3390/e21080769.