首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
2.
不确定数据的决策树分类算法   总被引:5,自引:0,他引:5  
李芳  李一媛  王冲 《计算机应用》2009,29(11):3092-3095
经典决策树算法不能处理树构建和分类过程中的不确定数据。针对这一局限,将可用于不确定数据表达的证据理论与决策树分类算法相结合,把决策树分类技术扩展到含有不确定数据的环境中。为避免在决策树构建过程中出现组合爆炸问题,引入新的测量算子和聚集算子,提出了D-S证据理论决策树分类算法。实验结果表明,D-S证据理论决策树分类算法能有效地对不确定数据进行分类,有较好的分类准确度,并能有效避免组合爆炸。  相似文献   

3.
基于自适应快速决策树的不确定数据流概念漂移分类算法   总被引:1,自引:0,他引:1  

由于不确定数据流中一般隐藏着概念漂移问题, 对其进行有效分类存在着很多困难. 为此, 提出一种基于自适应快速决策树的算法. 该算法基于一般决策树算法的原理, 以自适应学习规则计算信息增益, 以无标记情景学习拆分原理检测不确定数据流中的不确定数值属性, 通过自适应快速决策树节点的拆分方法将不确定数值属性转化为不确定分类属性, 以实现对不确定数据流的有效分类, 进而有效检测到其中隐含的概念漂移现象. 仿真结果验证了所提出方法的可靠性.

  相似文献   

4.
This paper proposes topology design and kinematic optimization of cyclical 5-degree-of-freedom (DoF) parallel manipulator with proper constrained limb. Firstly, a type of cyclical 5-DoF parallel manipulators with proper constrained limb is proposed by analyzing DoF of the proper constrained limb within workspace. Exampled by a cyclical 5-DoF parallel manipulator with the topology 4-UPS&1-RPS, its motion mapping model is formulated. By taking the reciprocal product of a wrench on a twist as the generalized virtual power, the local and global kinematic performance indices are provided. Then, on the basis of the actuated and constrained singularity analysis of the 4-UPS&1-RPS parallel manipulator within the position and pose workspace, the topology design of the manipulator without singularity is carried out, and its reachable and prescribed workspaces are obtained. Finally, by maximizing the global kinematic performance index and subjecting to a set of appropriate constraint conditions, the kinematic optimal design of the 4-UPS&1-RPS parallel manipulator is carried out utilizing the genetic algorithm of MATLAB optimization toolbox.  相似文献   

5.
近年来,数据流分类问题已经逐渐成为数据挖掘领域的一个研究热点,然而传统的数据流分类算法大多只能处理数据项已知并且为精确值的数据流,无法有效地应用于现实应用中普遍存在的不确定数据流。为建立适应数据不确定性的分类模型,提高不确定数据流分类准确率,提出一种针对不确定数据流的集成分类算法,该算法将不确定数据用区间及其概率分布函数表示,用C4.5决策树分类方法和朴素贝叶斯分类方法训练基分类器,在合理处理数据流中不确定性的同时,还能有效解决数据流中隐含的概念漂移问题。实验结果表明,所提算法在处理不确定数据流的分类时具有较好的鲁棒性,并且具有较高的分类准确率。  相似文献   

6.
GEHEIMSCHREIBER     
WOLFGANG MACHE 《Cryptologia》2013,37(4):230-242
World War II's “Fish” cipher was a British cover word for all kinds of encrypted German radio teleprinter messages. The GC&CS at Bletchley, Buckinghamshire, did not only attack successfully Enigma traffic (Morse signals on radio links) by the electromechanical deciphering machines called BOMBES. In addition, Bletchley's electronic text processor COLOSSUS broke the German “Tunny” ciphers, generated by TELEPRINTER ATTACHMENTS “SZ”, employed by the ‘Heer’ (Army) on HF radio links.  相似文献   

7.
A freely available data processor for the B asic E RS & ENVISAT ( A )ATSR and M ERIS Toolbox (BEAM) was developed to retrieve atmospheric and oceanic properties above and of Case‐2 waters from Medium Resolution Imaging Spectrometer (MERIS) Level1b data. The processor was especially designed for European coastal waters and uses MERIS Level1b Top‐Of‐Atmosphere (TOA) radiances to retrieve atmospherically corrected remote sensing reflectances at the Bottom‐Of‐Atmosphere (BOA), spectral aerosol optical thicknesses (AOT) and the concentration of three water constituents, namely chlorophyll‐a (CHL), total suspended matter (TSM) and the absorption of yellow substance at 443 nm (YEL). The retrieval is based on four separate artificial neural networks which were trained on the basis of the results of extensive radiative transfer (RT) simulations by taking various atmospheric and oceanic conditions into account. The accuracy of the retrievals was acquired by comparisons with concurrent in situ ground measurements and was published in full detail elsewhere. For the remote sensing reflectance product a mean absolute percentage error (MAPE) of 18% was derived within the spectral range 412.5–708.75 nm while the accuracy of the AOT at 550 nm in terms of MAPE was calculated to be 40%. The accuracies for CHL, TSM and YEL were derived from match‐up analysis with MAPEs of 50%, 60% and 71%, respectively.  相似文献   

8.
Two algorithms for solving the piecewise linear least–squares approximation problem of plane curves are presented. The first is for the case when the L 2 residual (error) norm in any segment is not to exceed a pre–assigned value. The second algorithm is for the case when the number of segments is given and a (balanced) L 2 residual norm solution is required. The given curve is first digitized and either algorithm is then applied to the discrete points. For each segment, we obtain the upper triangular matrix R in the QR factorization of the (augmented) coefficient matrix of the resulting system of linear equations. The least–squares solutions are calculated in terms of the R (and Q) matrices. The algorithms then work in an iterative manner by updating the least–squares solutions for the segments via up dating the R matrices. The calculation requires as little computational effort as possible. Numerical results and comments are given. This, in a way, is a tutorial paper.  相似文献   

9.
Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.’s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80 % of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.  相似文献   

10.
信息时代的到来,数据作为信息的载体其重要性也愈加突出,随着人们对不确定数据研究的深入,代价敏感数据挖掘技术被应用于不确定数据挖掘中。本文介绍了不确定数据,分析了现有不确定数据挖掘方法,在介绍代价敏感学习的基础上,介绍了一种针对不确定数据的代价敏感决策树算法,并通过实验验证了这一算法的合理可行。  相似文献   

11.
传统决策树通过对特征空间的递归划分寻找决策边界,给出特征空间的“硬”划分。但对于处理大数据和复杂模式问题时,这种精确决策边界降低了决策树的泛化能力。为了让决策树算法获得对不精确知识的自动获取,把模糊理论引进了决策树,并在建树过程中,引入神经网络作为决策树叶节点,提出了一种基于神经网络的模糊决策树改进算法。在神经网络模糊决策树中,分类器学习包含两个阶段:第一阶段采用不确定性降低的启发式算法对大数据进行划分,直到节点划分能力低于真实度阈值[ε]停止模糊决策树的增长;第二阶段对该模糊决策树叶节点利用神经网络做具有泛化能力的分类。实验结果表明,相较于传统的分类学习算法,该算法准确率高,对识别大数据和复杂模式的分类问题能够通过结构自适应确定决策树规模。  相似文献   

12.
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.  相似文献   

13.
High utility sequential pattern (HUSP) mining has emerged as an important topic in data mining. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Streaming data are fast changing, continuously generated unbounded in quantity. Such data can easily exhaust computer resources (e.g., memory) unless a proper resource-aware mining is performed. In this study, we explore the fundamental problem of how limited memory can be best utilized to produce high quality HUSPs over a data stream. We design an approximation algorithm, called MAHUSP, that employs memory adaptive mechanisms to use a bounded portion of memory, in order to efficiently discover HUSPs over data streams. An efficient tree structure, called MAS-Tree, is proposed to store potential HUSPs over a data stream. MAHUSP guarantees that all HUSPs are discovered in certain circumstances. Our experimental study shows that our algorithm can not only discover HUSPs over data streams efficiently, but also adapt to memory allocation with limited sacrifices in the quality of discovered HUSPs. Furthermore, in order to show the effectiveness and efficiency of MAHUSP in real-life applications, we apply our proposed algorithm to a web clickstream dataset obtained from a Canadian news portal to showcase users’ reading behavior, and to a real biosequence database to identify disease-related gene regulation sequential patterns. The results show that MAHUSP effectively discovers useful and meaningful patterns in both cases.  相似文献   

14.
In this paper, structural stiffness analysis of a new 3-axis asymmetric planar parallel manipulator, a 2 P RR–P P R structural kinematic chain, is investigated. The manipulator is proposed as a tool holder for a 5-axis hybrid computer numerical control (CNC) machine. First, the structure of the robot is introduced and inverse kinematics solution is presented. Secondly, stiffness matrix of the robot is determined using a continuous method based on Castigliano’s theorem and calculation of strain energy of the robot components. This method removes the need for commonly used simplifying assumptions and, therefore, results in good accuracy. For this purpose, force and strain energy for each segment of the robot are analyzed. Finally, to verify the analytical results, commercial FEM software is used to simulate the physical structure of the manipulator. A numerical example is presented which confirms the correctness of the analytical formulations.  相似文献   

15.
Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.  相似文献   

16.
Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the claro system that supports stream processing for uncertain data naturally captured using continuous random variables. claro employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for relational operators by exploring statistical theory and approximation. We also consider query planning for complex queries given an accuracy requirement. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements and outperform state-of-the-art sampling methods.  相似文献   

17.
Adapted One-versus-All Decision Trees for Data Stream Classification   总被引:1,自引:0,他引:1  
One versus all (OVA) decision trees learn k individual binary classifiers, each one to distinguish the instances of a single class from the instances of all other classes. Thus OVA is different from existing data stream classification schemes whose majority use multiclass classifiers, each one to discriminate among all the classes. This paper advocates some outstanding advantages of OVA for data stream classification. First, there is low error correlation and hence high diversity among OVA's component classifiers, which leads to high classification accuracy. Second, OVA is adept at accommodating new class labels that often appear in data streams. However, there also remain many challenges to deploy traditional OVA for classifying data streams. First, as every instance is fed to all component classifiers, OVA is known as an inefficient model. Second, OVA's classification accuracy is adversely affected by the imbalanced class distribution in data streams. This paper addresses those key challenges and consequently proposes a new OVA scheme that is adapted for data stream classification. Theoretical analysis and empirical evidence reveal that the adapted OVA can offer faster training, faster updating and higher classification accuracy than many existing popular data stream classification algorithms.  相似文献   

18.
为解决不确定数据流的预测问题,根据数据流高速、无限和动态不确定性的特点,在复杂人工智能预测和时间序列预测的基础上,提出一种基于优化策略的预测方法。综合考虑数据流中元组的不确定性与不确定异常性,以降低预测计算代价。同时考虑不确定的统计特性对卡尔曼滤波预测的影响,对Q和R进行异步优化估计,以形成最佳状态预测。实验结果表明,该方法的预测性能较好。  相似文献   

19.
不确定数据的PU学习在现实世界的许多应用中,如在传感器网络、市场分析和医学诊断等领域普遍存在,提出了针对不确定数据PU学习的决策树算法。基于POSC45中信息增益的计算方法,引入UDT中处理连续属性的不确定数据时用到的不确定数据区间及概率分布函数的概念,提出了一种能处理连续属性的不确定数据PU学习的决策树算法DTU-PU(Decision Tree for Uncertain data with PU-learning)。在UCI数据集上的实验表明,DTU-PU具有较好的分类准确率和健壮性。  相似文献   

20.
基于概率衰减窗口模型的不确定数据流频繁模式挖掘   总被引:2,自引:0,他引:2  
考虑到不确定数据流的不确定性,设计了一种新的概率频繁模式树PFP-tree和基于该树的概率频繁模式挖掘方法PFP-growth.PFP-growth使用事务性不确定数据流及概率衰减窗口模型,通过计算各概率数据项的期望支持度以发现概率频繁模式,其主要特点有:考虑到窗口内不同时间到达数据项的贡献度不同,采用概率衰减窗口模型计算期望支持度,以提高模式挖掘准确度;设置数据项索引表和事务索引表,以加快频繁模式树检索速度;通过剪枝删除不可能成为频繁模式的结点,以降低模式树的存储及检索开销;对每个结点都设立一个事务概率信息链表,以支持数据项在不同事务中具有不同概率的情形.实验结果表明,PFP-growth在保证挖掘模式准确度的前提下,在处理时间和内存空间等方面都具有较好的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号