首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
翟俊海    刘博  张素芳 《智能系统学报》2017,12(3):397-404
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。  相似文献   

2.
This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.  相似文献   

3.
面向大数据集管理的数据聚类方法研究在模式识别、故障诊断和数据挖掘等领域具有重要的研究意义。传统的大数据聚类算法采用混合差分进化的粒子群算法,因数据信息流分量之间的交叉作用而出现的类间交叉项干扰影响了聚类分量的正确判断,聚类效果不好。提出了一种基于时频聚集交叉项干扰抑制的大数据聚类算法。在面向传播学视域下物联网大数据库中生成大数据聚类的信息特征向量,对任意两个分簇矢量进行近邻样本的隶属度训练,在时间滑动窗口模型中进行信息调度,采用高频分量抑制方法实现对时频聚集交叉项的干扰抑制,通过频域卷积相似度融合处理,采用粒子群优化算法进行聚类适应度计算,以实现数据聚类算法改进。仿真结果表明,采用该算法进行大数据聚类,具有较好的抗干扰性和自适应性,聚类准确度较高。  相似文献   

4.
在数据挖掘中,由于数据集中含有大量的冗余和不相关的特征,因此特征选择是一个重要的预处理过程。提出了一个基于混合互信息和粒子群算法的过滤式-封装式的多目标特征选择方法(HMIPSO)。根据粒子的pbest距离上次更新的迭代次数,提出了自适应突变策略去扰动种群,避免种群陷入局部最优。同时基于帕累托前沿面和外部文档提出了一个新的集合概念。结合互信息和新的集合知识提出了一个局部搜索策略,使得帕累托前沿面中的粒子可以删除不相关和冗余的特征,然后通过精英策略更新学习前和学习后的帕累托前沿面。最后将提出的算法和另外4种多目标算法在15个UCI数据集上进行了测试,实验结果表明提出的算法能够更好地降低特征个数和分类错误率。  相似文献   

5.
入侵检测数据往往含有大量的冗余、噪音特征及部分连续型属性,为了提高网络入侵检测的效果,利用邻域粗糙集对入侵检测数据集进行属性约简,消除冗余属性及噪声,也避免了传统粗糙集在连续型属性离散化过程中带来的信息损失;使用粒子群算法优化支持向量机的核函数参数和惩罚参数,以避免靠主观选择参数带来精度较低的风险,进一步提高入侵检测的性能。仿真实验结果表明,该算法能有效提高入侵检测的精度,具有较高的泛化性和稳定性。  相似文献   

6.
粗糙集属性应急数据存在冗余特征,降低挖掘效率,提出基于信息熵的粗糙集属性应急数据去重挖掘算法.将粗糙集理论和信息熵相结合,离散化处理应急数据,离散化完成后,约简对于决策表的条件信息熵大小不产生任何影响的属性,设定决策属性集合和条件属性集合,选取将同约简属性集合B的属性组合数目最小的熵值实现约简,去除冗余特征,完成应急数据去重挖掘.以大型船舶应急数据为研究对象展开数据去重挖掘,结果表明:可有效去重挖掘到船舶旋回性相关应急数据,利用数据增比特征能够分析到各因素对船舶旋回性的影响,并且所研究算法的挖掘效率较高,在数据量为1400条时,耗时仅为0.33 s.  相似文献   

7.
针对不满足忠实分布的高维数据分类问题,一种新的基于粒子群算法的马尔科夫毯特征选择方法被提出。它通过有效地提取相关特征和剔除冗余特征,能够产生更好的分类结果。在特征预处理阶段,该算法通过最大信息系数衡量标准对特征的相关度和冗余性进行分析得到类属性的马尔科夫毯代表集和次最优特征子集;在搜索评价阶段,采用新的适应度函数通过粒子群算法选出最优特征子集;用此模型对测试集进行预测。实验结果表明,该算法在12个数据集上具有一定的优势。  相似文献   

8.
Rough particle swarm optimization and its applications in data mining   总被引:1,自引:1,他引:0  
This paper proposes a novel particle swarm optimization algorithm, rough particle swarm optimization algorithm (RPSOA), based on the notion of rough patterns that use rough values defined with upper and lower intervals that represent a range or set of values. In this paper, various operators and evaluation measures that can be used in RPSOA have been described and efficiently utilized in data mining applications, especially in automatic mining of numeric association rules which is a hard problem.  相似文献   

9.
This paper presents a new approach for power quality time series data mining using S-transform based fuzzy expert system (FES). Initially the power signal time series disturbance data are pre-processed through an advanced signal processing tool such as S-transform and various statistical features are extracted, which are used as inputs to the fuzzy expert system for power quality event detection. The proposed expert system uses a data mining approach for assigning a certainty factor for each classification rule, thereby providing robustness to the rule in the presence of noise. Further to provide a very high degree of accuracy in pattern classification, both the Gaussian and trapezoidal membership functions of the concerned fuzzy sets are optimized using a fuzzy logic based adaptive particle swarm optimization (PSO) technique. The proposed hybrid PSO-fuzzy expert system (PSOFES) provides accurate classification rates even under noisy conditions compared to the existing techniques, which show the efficacy and robustness of the proposed algorithm for power quality time series data mining.  相似文献   

10.
The feature selection process constitutes a commonly encountered problem of global combinatorial optimization. This process reduces the number of features by removing irrelevant, noisy, and redundant data, thus resulting in acceptable classification accuracy. Feature selection is a preprocessing technique with great importance in the fields of data analysis and information retrieval processing, pattern classification, and data mining applications. This paper presents a novel optimization algorithm called catfish binary particle swarm optimization (CatfishBPSO), in which the so-called catfish effect is applied to improve the performance of binary particle swarm optimization (BPSO). This effect is the result of the introduction of new particles into the search space (“catfish particles”), which replace particles with the worst fitness by the initialized at extreme points of the search space when the fitness of the global best particle has not improved for a number of consecutive iterations. In this study, the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) was used to evaluate the quality of the solutions. CatfishBPSO was applied and compared to 10 classification problems taken from the literature. Experimental results show that CatfishBPSO simplifies the feature selection process effectively, and either obtains higher classification accuracy or uses fewer features than other feature selection methods.  相似文献   

11.
基于粒子群优化算法和相关性分析的特征子集选择   总被引:3,自引:0,他引:3  
特征选择是模式识别与数据挖掘等领域的重要问题之一.针对此问题,提出了基于离散粒子群和相关性分析的特征子集选择算法,算法中采用过滤模式的特征选择方法,通过分析网络入侵数据中所有特征之间的相关性,利用离散粒子群算法在所有特征的空间里优化搜索,自动选择有效的特征子集以降低数据维度.1999 KDD Cup Data中IDS数据集的实验结果表明了提出算法的有效性.  相似文献   

12.
为了解决数据挖掘和机器学习领域中连续属性离散化问题,提出一种改进的自适应离散粒子群优化算法。将连续属性的断点集合作为离散粒子群,通过粒子间的相互作用最小化断点子集,同时引入模拟退火算法作为局部搜索策略,提高了粒子群的多样性和寻找全局最优解的能力。利用粗糙集理论中决策属性对条件属性的依赖度来衡量决策表的一致性,从而达到连续属性离散化的目的,最后采用多组数据对此算法的性能进行了检验,并与其他算法做了对比实验,实验结果表明此算法是有效的。  相似文献   

13.
Heuristic search-based test data generation has a potential higher efficiency in software testing with path covering. However, these approaches are suffered in covering the long and complex path. In this paper, we propose a method for generating test data based on program slicing and particle swarm optimization. With the interest points selected from a target path, we perform a program slicing to remove the statements which are irrelevant to the interest points. Our method simplifies the target path and the actual path to get a better fitness value. After program slices obtained, the population is evolved using particle swarm optimization to improve the efficiency of test data generation.  相似文献   

14.
针对大数据环境下随机森林算法存在冗余与不相关特征过多、特征子空间信息含量不足以及并行化效率低等问题,提出了结合增益率与堆叠自编码器的并行随机森林算法PRFGRSAE(parallel random forest algorithm combining gain ratio and stacked auto encoders)。首先,提出了结合非线性归一化增益率和堆叠自编码器的降维策略DRNGRSAE(dimension reduction combining nonlinear normalization gain ratio and stacked auto encoders),通过过滤特征集中的冗余和不相关特征,并利用堆叠自编码器提取特征,有效减少了冗余以及不相关特征数;其次,提出了结合拉丁超立方抽样与归一化相关度的子空间选择策略SSLF(subspace selection strategy combining Latin hypercube sampling and feature class correlation),通过对特征集进行多层划分抽样,形成空间表达度较高的特征子空...  相似文献   

15.
由于基因表达数据高维度、高噪声、小样本的特点,基因选择一直是肿瘤分类的一大挑战。为了提高肿瘤分类的精度,同时保证基因选择的效率,提出一种结合Relief-F和CART决策树的自适应粒子群优化(APSO)算法(R-C-APSO)。该方法首先利用Relief-F快速过滤大量无关基因和噪声,缩小基因选择范围;然后以CART决策树为适应度函数,用APSO算法对基因进行最终搜索。通过6个数据集的分析实验,实验结果表明,R-C-APSO拥有较高的分类精度和较快的基因选择速度,且具有良好的稳定性。  相似文献   

16.
李欣倩  杨哲  任佳 《测控技术》2022,41(2):36-40
根据朴素贝叶斯算法的特征条件独立假设,提出一种基于互信息和层次聚类双重特征选择的改进朴素贝叶斯算法。通过互信息方法剔除不相关的特征,然后依据欧氏距离将删减后的特征进行分层聚类,通过粒子群算法得到聚类簇的数量,最后将每个聚类簇中与类别互信息最高的特征合并为特征子集,并由朴素贝叶斯算法得到分类准确率。根据实验结果可知,该算法可以有效减少特征之间的相关性,提升算法的分类性能。  相似文献   

17.
基于半监督学习的K-均值聚类算法研究   总被引:4,自引:3,他引:1  
定义了一个欧氏距离和监督信息相混合的新的最近邻计算函数,从而将K-均值算法很好地应用于半监督聚类问题。针对K-均值算法初始质心敏感的缺陷,用粒子群算法的搜索空间模拟聚类的欧氏空间,迭代搜索找到较优的聚类质心,同时提出动态管理种群的策略以提高粒子群算法搜索效率。算法在UCI的多个数据集上测试都得到了较好的聚类准确率。  相似文献   

18.
At present there is no standard, authoritative fall detection test data, and the sample size by young people imitating fall is small, so how to use a limited data set to find the most representative feature set is particularly important. According to the characteristics of feature set in low sample and continuous type, a feature set optimization algorithm based on neighborhood consistency and discrete binary particle swarm optimization (DBPSO) was proposed. The algorithm firstly constituted the primary feature set based on optimized neighborhood consistency function and heuristic forward searching algorithm, and then used the primary feature set to initialize the population of DBPSO. At last the validity of the algorithm was verified using classification algorithm. The experimental results show that the algorithm can improve classification ability with fewer features selected, and the computational efficiency is also improved.  相似文献   

19.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

20.
针对诸多群智能算法容易陷入局部最优、收敛速度慢的特点,提出一种参数设置少,全局搜索能力强的竞争算法.通过10个基准函数与粒子群算法的比较,30次试验下竞争算法的平均值与最小值均优于粒子群算法,验证了该算法的有效性.用竞争算法优化BP神经网络,并对11个测试数据集进行分类,实验结果表明,用竞争算法优化后的BP神经网络在11个测试集上性能均优于原始算法,且在大部分测试集上性能优于用遗传算法优化的BP神经网络.该算法能有效提高分类正确率,增强鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号