首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 62 毫秒
1.
刘栋  宋国杰 《计算机应用》2011,31(5):1374-1377
为解决多维时间序列的分类并获取易于理解的分类规则,引入了时序熵的概念及构造时序熵的方法,基于属性选择和属性值划分两方面扩展了决策树模型。并给出了两种构造多维时间序列分类的决策树模型算法。最后,采用移动客户流失的真实数据,对过程决策树进行测试,展示了方法的可行性。  相似文献   

2.
多变量时间序列的模糊决策树挖掘*   总被引:4,自引:0,他引:4  
针对目前时间序列决策研究方法的一些缺陷,提出了多变量时间序列模糊决策树挖掘方法,并给出了该方法的实验分析。实验结果证明该方法能够找出多变量时间序列子序列的形态与某个序列的后期趋势或状态的决策信息。  相似文献   

3.
神经网络集成方法具有比单个神经网络更强的泛化能力,却因为其黑箱性而难以理解;决策树算法因为分类结果显示为树型结构而具有良好的可理解性,泛化能力却比不上神经网络集成。该文将这两种算法相结合,提出一种决策树的构造算法:使用神经网络集成来预处理训练样本,使用C4.5算法处理预处理后的样本并生成决策树。该文在UCI数据上比较了神经网络集成方法、决策树C4.5算法和该文算法,实验表明:该算法具有神经网络集成方法的强泛化能力的优点,其泛化能力明显优于C4.5算法;该算法的最终结果昆示为决策树,显然具有良好的可理解性。  相似文献   

4.
时间序列数据通常是指一系列带有时间间隔的实值型数据,广泛存在于煤矿、金融和医疗等领域。为解决现有时间序列数据分类问题中存在的含有大量噪声、预测精度低和泛化性能差的问题,提出了一种基于正则化极限学习机(RELM)的时间序列数据加权集成分类方法。首先,针对时间序列数据中所含有的噪声,利用小波包变换方法对时间序列数据进行去噪处理。其次,针对时间序列数据分类方法预测精度低、泛化性能较差的问题,提出了一种基于RELM的加权集成分类方法。该方法通过训练正则化极限学习机(RELM)隐藏层节点数量的方法,有效选取RELM基分类器;通过粒子群优化(PSO)算法,对RELM基分类器的权值进行优化;实现对时间序列数据的加权集成分类。实验结果表明,该分类方法能够对时间序列数据进行有效分类,并提升了分类精度。  相似文献   

5.
针对蛋白质序列分类的需求,深入研究了蛋白质序列分类算法。对蛋白质序列的特征属性进行了大量的分析和研究,给出了蛋白质序列特征属性的描述形式。在此基础上设计了一种基于加权决策树的蛋白质序列分类算法,详细阐述了加权决策树的构造过程以及决策树的主要参数计算方法,而且根据蛋白质序列的特征,对决策树进行了改进,给出了加权决策树的实现方法。测试结果表明:设计的蛋白质序列分类算法具有较高的分类精度和较快的执行速度。  相似文献   

6.
分类问题是数据挖掘中的基本问题之一,时间序列的特征表示及相似性度量是时间序列数据挖掘中分类、聚类及模式发现等任务的基础。SAX方法是一种典型的时间序列符号化表示方法,在采用该方法的基础上对时间序列进行分类,不仅可以有效地降维、降噪,而且具有简单、直观等特点,但是该方法有可能造成信息损失并影响到分类结果的准确性。为了弥补信息损失对分类结果的影响,采用了集成学习中大多数投票方法来弥补BOP表示后的信息损失,从而提高整个分类器的效率。针对一些样本在BOP表示中都损失了相似的重要信息,以至于大多数投票无法进一步提高分类效率的问题,进一步提出了结合集成学习中AdaBoost算法,通过对训练样本权重的调整,从而达到以提高分类器性能来弥补信息损失的效果。实验结果表明,将BOP方法与集成学习相结合的方法框架,不仅能很好地处理SAX符号化表示中的信息损失问题,而且与已有方法相比,在分类准确度方面也有显著的提高。  相似文献   

7.
8.
提出一种基于类别信息的分类器集成方法Cagging.基于类别信息重复选择样本生成基本分类器的训练集,增强了基本分类器之间的差异性;利用基本分类器对不同模式类的分类能力为每个基本分类器设置一组权重.使用权重对各分类器输出结果进行加权决策,较好地利用了各个基本分类器之间的差异性.在人脸图像库ORL上的实验验证了Cagging的有效性.此外,Cagging方法的基本分类器生成方式适合于通过增量学习生成集成分类器,扩展Cagging设计了基于增量学习的分类器集成方法Cagging-Ⅰ,实验验证了它的有效性.  相似文献   

9.
时间序列数据广泛存在于我们的生活中,吸引了越来越多的学者对其进行深入的研究.时间序列分类是时间序列的一个重要研究领域,目前已有上百种分类算法被提出.这些方法大致分为基于距离的方法、基于特征的方法以及基于深度学习的方法.前两类方法需要手动处理特征和人为选择分类器,而大多数的深度学习方法属于端到端的方法,并且在时间序列分类...  相似文献   

10.
曹阳  闫秋艳  吴鑫 《计算机应用》2021,41(3):651-656
针对现有集成分类方法对不平衡时间序列数据学习能力欠佳的问题,采用优化组件算法性能和集成策略的思路,以异构集成方法即基于变换的集合的层次投票集合(HIVE-COTE)为基础,提出一种不平衡时间序列集成分类算法IMHIVE-COTE.该算法主要包含两个改进内容:首先,增加了一个新的不平衡分类组件SBST-HESCA,引入B...  相似文献   

11.
齐峰  刘希玉 《控制与决策》2010,25(11):1684-1688
针对数据挖掘领域分类问题的特点.提出了基于多神经树集成的分类模型(CMBNTE).该模型利用改进遗传规划算法和粒子群算法,实现单个神经树模型的优化;借鉴集成学习思想,将多个神经树模型组合成最终的分类模型.在6个UCI数据集上的实验结果表明,该模型能较好地解决分类问题,尤其适用于多分类属性的复杂分类问题.  相似文献   

12.
This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them. The source of randomness is a weight associated with each attribute. All the nodes in a tree use the same set of random weights but different from the set of weights in other trees. So, the importance given to the attributes will be different in each tree and that will differentiate their construction. The method is compared to Bagging, Random Forest, Random-Subspaces, AdaBoost and MultiBoost, obtaining favourable results for the proposed method, especially when using noisy data sets. RFW can be combined with these methods. Generally, the combination of RFW with other method produces better results than the combined methods. Kappa-error diagrams and Kappa-error movement diagrams are used to analyse the relationship between the accuracies of the base classifiers and their diversity.  相似文献   

13.
Web缓存技术是互联网提升访问性能的有效手段。构建了基于决策树的智能缓存数据挖掘模型,通过数据建模、NextAccess的离散化、构造决策树所用的观察属性集和权重,从而对数据进行缓存处理。实验证明其性能得到了优化。  相似文献   

14.
提出一种基于模糊决策树的软件成本估计模型。该模型能够把决策树的归纳学习能力和模糊集合所具有的表达能力相结合,在软件成本估计过程中以模糊集的形式预测相对误差范围,并利用决策树的判定规则分析误差源。最后,通过软件项目历史数据验证该软件成本估计模型的有效性。  相似文献   

15.
This paper proposes a method for combining multiple tree classifiers based on both classifier ensemble (bagging) and dynamic classifier selection schemes (DCS). The proposed method is composed of the following procedures: (1) building individual tree classifiers based on bootstrap samples; (2) calculating the distance between all possible two trees; (3) clustering the trees based on single linkage clustering; (4) selecting two clusters by local region in terms of accuracy and error diversity; and (5) voting the results of tree classifiers selected in the two clusters. Empirical evaluation using publicly available data sets confirms the superiority of our proposed approach over other classifier combining methods.  相似文献   

16.
姚潍  王娟  张胜利 《计算机应用》2015,35(10):2883-2885
入侵检测要求系统能够快速准确地找出网络中的入侵行为,因此对检测算法的效率有较高的要求。针对入侵检测系统效率和准确率偏低,系统的误报率和漏报率偏高的问题,在充分分析C4.5算法和朴素贝叶斯(NB)算法后,提出一种二者相结合的H-C4.5-NB入侵检测模型。该模型以概率的形式来描述决策类别的分布,并由C4.5和NB概率加权和的形式给出最终的决策结果,最后使用KDD 99数据集测试模型性能。实验结果表明,与传统的C4.5、NB和NBTree方法相比,在H-C4.5-NB中对拒绝服务(DoS)攻击的分类准确率提高了约9%,对U2R和R2L攻击的准确率提高约20%~30%。  相似文献   

17.
In multi-instance learning, the training set is composed of labeled bags each consists of many unlabeled instances, that is, an object is represented by a set of feature vectors instead of only one feature vector. Most current multi-instance learning algorithms work through adapting single-instance learning algorithms to the multi-instance representation, while this paper proposes a new solution which goes at an opposite way, that is, adapting the multi-instance representation to single-instance learning algorithms. In detail, the instances of all the bags are collected together and clustered into d groups first. Each bag is then re-represented by d binary features, where the value of the ith feature is set to one if the concerned bag has instances falling into the ith group and zero otherwise. Thus, each bag is represented by one feature vector so that single-instance classifiers can be used to distinguish different classes of bags. Through repeating the above process with different values of d, many classifiers can be generated and then they can be combined into an ensemble for prediction. Experiments show that the proposed method works well on standard as well as generalized multi-instance problems. Zhi-Hua Zhou is currently Professor in the Department of Computer Science & Technology and head of the LAMDA group at Nanjing University. His main research interests include machine learning, data mining, information retrieval, and pattern recognition. He is associate editor of Knowledge and Information Systems and on the editorial boards of Artificial Intelligence in Medicine, International Journal of Data Warehousing and Mining, Journal of Computer Science & Technology, and Journal of Software. He has also been involved in various conferences. Min-Ling Zhang received his B.Sc. and M.Sc. degrees in computer science from Nanjing University, China, in 2001 and 2004, respectively. Currently he is a Ph.D. candidate in the Department of Computer Science & Technology at Nanjing University and a member of the LAMDA group. His main research interests include machine learning and data mining, especially in multi-instance learning and multi-label learning.  相似文献   

18.
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.  相似文献   

19.
We have proposed a decision tree classifier named MMC (multi-valued and multi-labeled classifier) before. MMC is known as its capability of classifying a large multi-valued and multi-labeled data. Aiming to improve the accuracy of MMC, this paper has developed another classifier named MMDT (multi-valued and multi-labeled decision tree). MMDT differs from MMC mainly in attribute selection. MMC attempts to split a node into child nodes whose records approach the same multiple labels. It basically measures the average similarity of labels of each child node to determine the goodness of each splitting attribute. MMDT, in contrast, uses another measuring strategy which considers not only the average similarity of labels of each child node but also the average appropriateness of labels of each child node. The new measuring strategy takes scoring approach to have a look-ahead measure of accuracy contribution of each attribute's splitting. The experimental results show that MMDT has improved the accuracy of MMC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号