首页 | 本学科首页   官方微博 | 高级检索  
     

容忍噪音的特征子集选择算法研究
引用本文:王兴起,孔繁胜. 容忍噪音的特征子集选择算法研究[J]. 计算机研究与发展, 2002, 39(12): 1637-1644
作者姓名:王兴起  孔繁胜
作者单位:浙江大学人工智能研究所,杭州,310027
基金项目:国家重点基础研究发展规划项目基金资助 (G19980 10 2 0 0 )
摘    要:特征子集选择问题一直是人工智能领域研究的重要内容,特别是近几年来,特征子集选择算法研究已经成为机器学习和数据挖掘等领域的研究热点,提出了一个新的特征子集选择算法-容忍噪音的特征子集选择算法(NFS),该算法将聚类的思想引入到噪音的处理,并将Gini系数和墨西哥帽函数应用于特征选取,实现对偏吸噪音数据集的特征子集选择,实际领域的实验结果表明,NFS算法具有噪音容忍度高,选择特征代表性强和求解速度快的优点,因此能够有效地应用于实际领域。

关 键 词:特征子集选择算法 启发式算法 噪音 人工智能 机器学习 数据挖掘

RESEARCH ON A NOISE-TOLERANT ALGORITHM FOR FEATURE SUBSET SELECTION
WANG Xing Qi and KONG Fan Sheng. RESEARCH ON A NOISE-TOLERANT ALGORITHM FOR FEATURE SUBSET SELECTION[J]. Journal of Computer Research and Development, 2002, 39(12): 1637-1644
Authors:WANG Xing Qi and KONG Fan Sheng
Abstract:The feature subset selection problem has long been the focus in the area of artificial intelligence. Especially, in the past few years it has received considerable attention from machine learning and data mining. There are always noises in the real world datesets. Unfortunately, in most of the current feature subset selection algorithms, datasets have been assumed to be entirely perfect, that is, they have no noises. This makes it very difficult for these algorithms to obtain better results from the real world datasets. A new algorithm, the noise tolerant algorithm for feature subset selection (NFS), is proposed. Clustering approach is applied to processing noises. And Gini index and Mexico hat function are used to select feature subset. Compared with the current heuristic algorithms, NFS can not only find feature subset containing lesser features efficiently, but also give higher accuracy for the learning system in the dataset described only by selected features. This implies that NFS can select more representative features and tolerate noises in the real world datasets effectively.
Keywords:feature subset selection   heuristic algorithm   noise
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号