共查询到20条相似文献,搜索用时 9 毫秒
1.
Bagging, boosting, rotation forest and random subspace methods are well known re-sampling ensemble methods that generate and
combine a diversity of learners using the same learning algorithm for the base-classifiers. Boosting and rotation forest algorithms
are considered stronger than bagging and random subspace methods on noise-free data. However, there are strong empirical indications
that bagging and random subspace methods are much more robust than boosting and rotation forest in noisy settings. For this
reason, in this work we built an ensemble of bagging, boosting, rotation forest and random subspace methods ensembles with
6 sub-classifiers in each one and then a voting methodology is used for the final prediction. We performed a comparison with
simple bagging, boosting, rotation forest and random subspace methods ensembles with 25 sub-classifiers, as well as other
well known combining methods, on standard benchmark datasets and the proposed technique had better accuracy in most cases. 相似文献
2.
In this paper, we propose a novel method, called random subspace method (RSM) based on tensor (Tensor-RS), for face recognition. Different from the traditional RSM which treats each pixel (or feature) of the face image as a sampling unit, thus ignores the spatial information within the face image, the proposed Tensor-RS regards each small image region as a sampling unit and obtains spatial information within small image regions by using reshaping image and executing tensor-based feature extraction method. More specifically, an original whole face image is first partitioned into some sub-images to improve the robustness to facial variations, and then each sub-image is reshaped into a new matrix whose each row corresponds to a vectorized small sub-image region. After that, based on these rearranged newly formed matrices, an incomplete random sampling by row vectors rather than by features (or feature projections) is applied. Finally, tensor subspace method, which can effectively extract the spatial information within the same row (or column) vector, is used to extract useful features. Extensive experiments on four standard face databases (AR, Yale, Extended Yale B and CMU PIE) demonstrate that the proposed Tensor-RS method significantly outperforms state-of-the-art methods. 相似文献
3.
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance. 相似文献
4.
针对经典C4.5决策树算法存在过度拟合和伸缩性差的问题,提出了一种基于Bagging的决策树改进算法,并基于MapReduce模型对改进算法进行了并行化。首先,基于Bagging技术对C4.5算法进行了改进,通过有放回采样得到多个与初始训练集大小相等的新训练集,并在每个训练集上进行训练,得到多个分类器,再根据多数投票规则集成训练结果得到最终的分类器;然后,基于MapReduce模型对改进算法进行了并行化,能够并行化处理训练集、并行选择最佳分割属性和最佳分割点,以及并行生成子节点,实现了基于MapReduce Job工作流的并行决策树改进算法,提高了对大数据集的分析能力。实验结果表明,并行Bagging决策树改进算法具有较高的准确度与敏感度,以及较好的伸缩性和加速比。 相似文献
5.
在实际生产过程中,采用传统子空间辨识法建立的离线模型并不能有效准确地跟踪系统的动态变化;奇异值分解等线性代数工具虽然增加算法的数值鲁棒性,但也相应增加了子空间辨识的在线递推困难.为解决上述问题,本文针对连续时间系统提出基于随机分布理论的递推子空间辨识方法.首先,通过随机分布理论构建系统的连续随机分布函数,并利用微分计算获得系统等价的输入输出矩阵方程.然后,采用将输入输出数据矩阵"R"规模固定的方法,达到数据压缩的目的.最后,通过最小二乘法和残差分析法递推更新模型的系统矩阵和噪声强度直至达到辨识要求.仿真结果验证了所提方法的有效性和精确性. 相似文献
6.
针对稀疏子空间聚类(SSC)方法聚类误差大的问题,提出了基于随机分块的SSC方法。首先,将原问题数据集随机分成几个子集,构建几个子问题;然后,采用交替方向乘子法(ADMM)分别求得几个子问题的系数矩阵,之后将几个系数矩阵扩充成与原问题一样大小的系数矩阵,并整合成一个系数矩阵;最后,根据整合得到的系数矩阵计算得到一个相似矩阵,并采用谱聚类(SC)算法获得原问题的聚类结果。相较于稀疏子空间聚类(SSC)、随机稀疏子空间聚类(S 3COMP-C)、基于正交匹配追踪的稀疏子空间聚类(SSCOMP)、谱聚类(SC)和 K均值( K-Means)算法中的最优算法,基于随机分块的SSC方法将子空间聚类误差平均降低了3.12个百分点,且其互信息、兰德指数和熵3个性能指标都明显优于对比算法。实验结果表明基于随机分块的SSC方法能降低子空间聚类误差,改善聚类性能。 相似文献
7.
针对扩展隔离林(EIF)算法时间开销过大的问题,提出了一种基于随机子空间的扩展隔离林(RS-EIF)算法.首先,在原数据空间确定多个随机子空间;然后,在不同的随机子空间中通过计算每个节点的截距向量与斜率来构建扩展孤立树,并将多棵扩展孤立树集成为子空间扩展隔离林;最后,通过计算数据点在扩展隔离林中的平均遍历深度来确定数据... 相似文献
8.
Graph structure is vital to graph based semi-supervised learning. However, the problem of constructing a graph that reflects the underlying data distribution has been seldom investigated in semi-supervised learning, especially for high dimensional data. In this paper, we focus on graph construction for semi-supervised learning and propose a novel method called Semi-Supervised Classification based on Random Subspace Dimensionality Reduction, SSC-RSDR in short. Different from traditional methods that perform graph-based dimensionality reduction and classification in the original space, SSC-RSDR performs these tasks in subspaces. More specifically, SSC-RSDR generates several random subspaces of the original space and applies graph-based semi-supervised dimensionality reduction in these random subspaces. It then constructs graphs in these processed random subspaces and trains semi-supervised classifiers on the graphs. Finally, it combines the resulting base classifiers into an ensemble classifier. Experimental results on face recognition tasks demonstrate that SSC-RSDR not only has superior recognition performance with respect to competitive methods, but also is robust against a wide range of values of input parameters. 相似文献
9.
针对microRNA识别方法中过多注重新特征、忽略弱分类能力特征和冗余特征,导致敏感性和特异性指标不佳或两者不平衡的问题,提出一种基于特征聚类和随机子空间的集成算法CLUSTER-RS。该算法采用信息增益率剔除部分弱分类能力的特征后,利用信息熵度量特征之间相关性,对特征进行聚类,再从每个特征簇中随机选取等量特征组成特征集用于构建基分类器,最后将基分类器集成用于microRNA识别。通过调整参数、选择基分类器实现算法最优化后,在microRNA最新数据集上与经典方法Triplet-SVM、miPred、MiPred、microPred和HuntMi进行对比实验,结果显示CLUSTER-RS在识别中敏感性不及microPred但优于其他模型,特异性为六者最优,而且从整体性能指标准确性和马修兹系数可以看出,CLUSTER-RS比其他算法具有优势。结果表明,CLUSTER-RS取得了较好的识别效果,在敏感性和特异性上实现了很好的平衡,即在性能指标平衡方面优于对比方法。 相似文献
10.
目前多标签学习已广泛应用到很多场景中,在此类学习问题中,一个样本往往可以同时拥有多个类别标签。由于类别标签可能带有的特有属性(即类属属性)将更有助于标签分类,所以已经出现了一些基于类属属性的多标签学习算法。针对类属属性构造会导致属性空间存在冗余的问题,本文提出了一种多标签类属特征提取算法LIFT_RSM。该方法基于类属属性空间通过综合利用随机子空间模型及成对约束降维思想提取有效的特征信息,以达到提升分类性能的目的。在多个数据集上的实验结果表明:与若干经典的多标签算法相比,提出的LIFT_RSM算法能得到更好的分类效果。 相似文献
11.
With the rapid growth and increased competition in credit industry, the corporate credit risk prediction is becoming more important for credit-granting institutions. In this paper, we propose an integrated ensemble approach, called RS-Boosting, which is based on two popular ensemble strategies, i.e., boosting and random subspace, for corporate credit risk prediction. As there are two different factors encouraging diversity in RS-Boosting, it would be advantageous to get better performance. Two corporate credit datasets are selected to demonstrate the effectiveness and feasibility of the proposed method. Experimental results reveal that RS-Boosting gets the best performance among seven methods, i.e., logistic regression analysis (LRA), decision tree (DT), artificial neural network (ANN), bagging, boosting and random subspace. All these results illustrate that RS-Boosting can be used as an alternative method for corporate credit risk prediction. 相似文献
12.
传统的程序相似性检测工具并不能有效地检测出一些常见的高级词法、语义理解变换的抄袭方式。首先归纳了学生常用的三类抄袭手段,然后给出了基于词法树的程序相似性检测方法。以C语言为例,总结了生成词法树的结构体,并对程序的词法树进行主数据流、结构控制流和时序流分析后得出结构体依赖图;使用形式化的图同型方法来判断代码是否相似,还给出了一个聚类方法以获得彼此相似的程序子集。通过与JPlag、BuaaSim系统针对一组典型的抄袭样本集进行评测结果对比,本方法具有更好的检测效果。 相似文献
13.
Linear discriminant analysis (LDA) often suffers from the small sample size problem when dealing with high-dimensional face data. Random subspace can effectively solve this problem by random sampling on face features. However, it remains a problem how to construct an optimal random subspace for discriminant analysis and perform the most efficient discriminant analysis on the constructed random subspace. In this paper, we propose a novel framework, random discriminant analysis (RDA), to handle this problem. Under the most suitable situation of the principal subspace, the optimal reduced dimension of the face sample is discovered to construct a random subspace where all the discriminative information in the face space is distributed in the two principal subspaces of the within-class and between-class matrices. Then we apply Fisherface and direct LDA, respectively, to the two principal subspaces for simultaneous discriminant analysis. The two sets of discriminant analysis features from dual principal subspaces are first combined at the feature level, and then all the random subspaces are further integrated at the decision level. With the discriminating information fusion at the two levels, our method can take full advantage of useful discriminant information in the face space. Extensive experiments on different face databases demonstrate its performance. 相似文献
14.
通过分析现有微处理器验证方案的不足,提出了一种以功能覆盖率为参考条件的动态伪随机验证方法。实验结果表明,与传统验证手段相比,该方法在仿真时间相同的情况下,条件覆盖率平均提高了13%;在测试指令数目相同的情况下,条件覆盖率平均提高了20%。 相似文献
15.
针对移动机器人路径规划过程中基于快速探索随机树(RRT)算法难以对窄道进行采样的问题,提出一种专门用于狭窄通道路径规划的改进桥梁检测算法。首先对环境地图预处理并提取出障碍物边缘节点集合作为桥梁检测算法的采样空间,从而避免了大量无效采样点,并使窄道样本点分布更加合理化;其次改进了桥梁端点的构建过程,提高了桥梁检测算法的运算效率;最后使用一种轻微变异Connect算法快速扩展窄道样本点。对于实验中的窄道环境地图,与原始RRT-Connect算法相比较,所提改进算法的路径探索成功率由68%提高到92%。实验结果表明,该算法能够较好地完成窄道样本点采样并有效地提高路径规划效率。 相似文献
16.
针对化妆对人脸识别准确率的负面影响,提出了基于补丁集成学习的改进鲁棒人脸识别算法。首先,将每张人脸图像嵌入补丁中并用一组特征描述符描述每个补丁,即本地梯度Gabor模式(LGP)、Gabor空间定序定比测量直方图(HGSFRM)和密集采样局部多值模式(DSLMP )。然后,使用改进的随机子空间线性判别分析(SRS-LDA)方法采样补丁,并在化妆之前和化妆之后图像之间建立多个公共子空间进行集成学习。最后,利用协作和稀疏表示分类器比较这个子空间中的特征向量,同时通过求和规则联合得到的分数。实验将提出的算法在多种化妆数据集上进行评估分析,结果表明提出的算法相比于其他专为妆后人脸识别设计的算法有更高的识别精度。 相似文献
17.
针对分布式环境下FP-tree的构造及合并,给出了一种网格环境下FP-tree的分布式构造算法GridDBMA。该算法中,各站点根据全局项目头表,独立构造局部频繁模式树BFP-tree,然后,利用合并算法将各局部树合并为一棵全局频繁模式树,并在全局频繁模式树上提取出所求的频繁项目集,通过对传统频繁模式树的存储结构的改进,减少了树的规模及站点间的网络通信量,并使树的遍历更加方便有效,提高了合并效率,从而提高了整个频繁项目集的挖掘效率。最后,采用天体光谱数据作为形式背景,实验验证了该算法的正确性和有效性。 相似文献
18.
关联规则挖掘的主要任务是根据对事务的统计找出项之间的关系。传统的挖掘算法要求项具有逻辑属性,并在挖掘过程中产生大量的中间项集,成为算法的瓶颈。给出一种基于关联路径树的表格数据组织形式,并采用模式指导的方式进行频繁项集挖掘,该方法不要求项具有逻辑属性,初始模式不同的项集组合迭代可以分配到不同的CPU完成,提高了算法的执行效率。该算法对美国1984年国会选举数据进行了实验,结果完全正确。 相似文献
19.
针对目前企业营销的不断深入,企业简称被各大新闻广泛使用,而作为新词又难以被有效识别的问题,提出一种基于构成模式和条件随机场(CRF)的企业简称预测方法。首先,从语言学的角度对企业全称和简称的构成规律进行了总结,并采用词库以及规则相结合的方式对Bi-gram算法进行改进,提出CBi-gram算法,实现了对企业全称的结构化切分,并提高了企业全称中核心词识别的准确性。然后,依据上述切分结果对企业类型进行再次细分,并通过人工总结和规则自学习的方法形成不同企业类型下的简称规则集。最后再基于规则生成企业的候选简称集,降低了不适用的规则对于不同类型的企业在生成简称过程中产生的噪声。另外,为了弥补单纯基于规则在解决全称缩写和简写缩写混合的局限性,引入CRF,从统计的角度对简称进行预测,并选取词、音调以及词在全称组成成分中的位置作为模型特征,进行模型训练,以实现两种方法的相互补充。实验结果显示,该方法具有较高的准确率,输出的企业简称集基本覆盖了企业的常用简称范围。 相似文献
20.
在用差别矩阵思想设计的属性约简算法中,由于差别矩阵存在大量重复和无用的差别元素,不仅占用大量的存储空间,而且浪费属性约简的计算时间。为提高这种属性约简算法的效率,结合FP树(频繁模式树)的思想,给出一种新型的数据结构——改进的FP树(IFP_Tree)。改进的FP树可以完全删除差别矩阵中所有重复的差别元素,也可以完全删除无用的差别元素。不但减少了大量的存储空间,还大大提高了属性约简算法的效率。用IFP树设计一种新的快速属性约简算法。实例说明了该算法的有效性。 相似文献
|