期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Concurrency and Computation》2017,29(8)

In the supervised classification, large training data are very common, and decision trees are widely used. However, as some bottlenecks such as memory restrictions, time complexity, or data complexity, many supervised classifiers including classical C4.5 tree cannot directly handle big data. One solution for this problem is to design a highly parallelized learning algorithm. Motivated by this, we propose a parallelized C4.5 decision tree algorithm based on MapReduce (MR‐C4.5‐Tree) with 2 parallelized methods to build the tree nodes. First, an information entropy‐based parallelized attribute selection method (MR‐A‐S) on several subsets for MR‐C4.5‐Tree is proposed to confirm the best splitting attribute and the cut points. Then, a data splitting method (MR‐D‐S) in parallel is presented to partition the training data into subsets. At last, we introduce the MR‐C4.5‐Tree learning algorithm that grows in a top‐down recursive way. Besides, the depth of the constructed decision tree, the number of samples and the maximal class probability in each tree node are used as the termination conditions to avoid the over‐partitioning problem. Experimental studies show the feasibility and the good performance of the proposed parallelized MR‐C4.5‐Tree algorithm. 相似文献

2.

决策树与人工神经网络的对比分析 总被引：2，自引：0，他引：2

赵雪清安晓东《电脑开发与应用》2007,20(11):13-15

决策树和人工神经网络是数据挖掘分类任务中两项重要技术,各具特点,对不同的数据类型应采用不同的算法进行相应的研究应用。为了深入地说明各自的特点,根据决策树C 4.5算法的原理和流程,以及人工神经网络的BP网络模型原理和实现分类的流程,并应用具体的实例,对两种技术进行了对比分析研究,得出并验证了它们在实现分类中的一些性能差异。相似文献

3.

C4.5算法在保险业可疑交易分析中的应用

汪雪元张军杜萍《计算机与现代化》2010,(4):163-166,170

上报可疑交易报告是保险公司日常工作之一,目前对可疑交易的识别基本上是工作人员依照中国人民银行公布的《金融机构大额交易和可疑交易报告管理办法》对交易逐条进行识别。本文利用经典的决策树C4.5算法对交易进行分析,自动识别出部分可疑交易。相似文献

4.

分类挖掘在大学生智能评估系统中的设计与实现 总被引：5，自引：0，他引：5

黄晶晶倪天倪《计算机与现代化》2005,(3):96-98

主要介绍了耗时短、效率高、发展比较成熟的决策树算法C4．5,以及该算法在大学生智能评估系统中的分类挖掘子模块中的设计和实现。相似文献

5.

《Concurrency and Computation》2018,30(10)

To address the time‐consuming problem for the confirmation of splitting attributes and splitting points in classic rank mutual information based decision trees, this paper establishes a fast rank mutual information based decision tree (FRMIDT) for classification problems. First, the proposed FRMIDT algorithm improves the velocity by a max‐relevance and min‐redundancy criterion to remove the redundant attributes in each tree node building. Then, the fuzzy c‐means algorithm is employed to confirm the splitting points for further acceleration. Meanwhile, a parallel implementation is developed in the framework of Map‐Reduce (MR‐FRMIDT) for medium or large‐scale data classification. Several comparative studies are conducted on UCI benchmark data sets. In contrast to the classic rank mutual information based decision tree on 12 data sets, the proposed FRMIDT model effectively reduces the computational time on the premise of keeping testing accuracy. Furthermore, the proposed FRMIDT algorithm is comparable through comparing FRMIDT with other traditional decision tree classifiers including BFT, C4.5, LAD, NBT, and SC. Meanwhile, the comparison with 7 different popular splitting measures based monotonic decision trees on several data sets illustrates the effectiveness of FRMIDT in monotonic classification. At last, the experimental analysis on other 6 data sets shows that the proposed MR‐FRMIDT is feasible and has a good parallel performance on reducing execution time and avoiding memory restrictions. 相似文献

6.

决策树算法在智能导学评价子系统中的应用

傅傑李雯《计算机与现代化》2009,(8):1-3,6

介绍智能导学系统的特点,并对决策树C4.5算法的原理进行了阐述,通过C4.5构造了一个学生在线学习效果的评估模型.并利用该模型得到的分类规则进行预测,得到准确性评估表,从而验证决策树算法的灵活性和计算的高效性. 相似文献

7.

C4.5算法在保险客户流失分析中的应用 总被引：11，自引：0，他引：11

桂现才彭宏王小华《计算机工程与应用》2005,41(17):197-199,214

保持客户和吸引客户是保险公司提高竞争力的关键,目前保险公司对客户流失的分析是粗略的或根据经验来判断。论文利用面向属性归纳和决策树C4.5算法对保险客户基本信息进行分析,找出客户流失的特征,帮助保险公司有针对性地改善客户关系。相似文献

8.

Parallel Formulations of Decision-Tree Classification Algorithms 总被引：5，自引：0，他引：5

Anurag Srivastava Eui-Hong Han Vipin Kumar Vineet Singh 《Data mining and knowledge discovery》1999,3(3):237-261

Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability. 相似文献

9.

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging,Boosting, and Randomization 总被引：43，自引：0，他引：43

Dietterich Thomas G. 《Machine Learning》2000,40(2):139-157

Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a base learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization. 相似文献

10.

Using Model Trees for Classification 总被引：1，自引：0，他引：1

Frank Eibe Wang Yong Inglis Stuart Holmes Geoffrey Witten Ian H. 《Machine Learning》1998,32(1):63-76

Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classification problems by employing a standard method of transforming a classification problem into a problem of function approximation. Surprisingly, using this simple transformation the model tree inducer M5, based on Quinlan's M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric. 相似文献

11.

Guangping Tang Wangdong Yang Kenli Li Yu Ye Guoqing Xiao Keqin Li 《Concurrency and Computation》2015,27(17):5076-5095

An optimized parallel algorithm is proposed to solve the problem occurred in the process of complicated backward substitution of cyclic reduction during solving tridiagonal linear systems. Adopting a hybrid parallel model, this algorithm combines the cyclic reduction method and the partition method. This hybrid algorithm has simple backward substitution on parallel computers comparing with the cyclic reduction method. In this paper, the operation count and execution time are obtained to evaluate and make comparison for these methods. On the basis of results of these measured parameters, the hybrid algorithm using the hybrid approach with a multi‐threading implementation achieves better efficiency than the other parallel methods, that is, the cyclic reduction and the partition methods. In particular, the approach involved in this paper has the least scalar operation count and the shortest execution time on a multi‐core computer when the size of equations meets some dimension threshold. The hybrid parallel algorithm improves the performance of the cyclic reduction and partition methods by 19.2% and 13.2%, respectively. In addition, by comparing the single‐iteration and multi‐iteration hybrid parallel algorithms, it is found that increasing iteration steps of the cyclic reduction method does not affect the performance of the hybrid parallel algorithm very much. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

12.

基于并行C4.5算法的民机质量数据决策指导

魏壮宇蔡红霞李钧《工业控制计算机》2018,(5):129-130,133

民机设备系统每天都在产生大量的质量数据信息。随着时间的推移及数据量的积累,传统离散制造业的数据统计分析已经无法对这些庞大的质量数据进行有效地处理分析。为了解决这一问题,并挖掘出数据之间的隐含规律,提出了一种有效的数据挖掘方法。该方法通过集成决策树C4.5并行算法完成质量数据分析。分析结果展示了该分析方法的正确性、有效性和价值性。相似文献

13.

基于并行C4.5的铁路零散白货客户流失预测研究

张斌彭其渊刘帆洨《计算机应用研究》2019,36(3)

为了提高铁路零散白货客户流失预测的准确性和高效性,根据铁路零散白货客户的流失特征,提出了基于CDL模型的客户流失识别方法,在此基础上,针对数据量大的问题,提出了基于Hadoop并行框架的C4.5决策树客户流失预测模型。通过仿真实验,证明该模型具有较好的准确性和预测能力,并且随着样本数量的增加,Hadoop并行框架的效率得到了明显的提升,且不影响客户流失预测模型的准确性和预测能力。相似文献

14.

基于改进的C4.5算法的网络流量分类方法

周剑峰阳爱民刘吉财《计算机工程与应用》2012,48(5):71-74

在基于C4.5算法的网络流量分类方法中,网络流量数据量的海量性及其特征的多样性使得决策树的构建速度、分类速度成为评价网络流量分类器的重要标准。在原C4.5算法的基础上提出一种改进的信息熵的计算方法,通过减少计算函数的复杂度,提高决策树的构建速度。实验表明,基于改进后算法的分类器在达到原有分类准确率的同时,极大地缩短了决策树的构成时间。相似文献

15.

Sandy Brand Rafael Bidarra 《Computer Animation and Virtual Worlds》2012,23(2):73-85

Game developers are often faced with very demanding requirements on huge numbers of agents moving naturally through increasingly large and detailed virtual worlds. With the advent of multi‐core architectures, new approaches to accelerate expensive pathfinding operations are worth being investigated. Traditional single‐processor pathfinding strategies, such as A^* and its derivatives, have been long praised for their flexibility. We implemented several parallel versions of such algorithms to analyze their intrinsic behavior, concluding that they have a large overhead, yield far from optimal paths, do not scale up to many cores or are cache unfriendly. In this article, we propose Parallel Ripple Search, a novel parallel pathfinding algorithm that largely solves these limitations. It utilizes a high‐level graph to assign local search areas to CPU cores at “equidistant” intervals. These cores then use A^* flooding behavior to expand towards each other, yielding good “guesstimate points” at border touch on. The process does not rely on expensive parallel programming synchronization locks but instead relies on the opportunistic use of node collisions among cooperating cores, exploiting the multi‐core's shared memory architecture. As a result, all cores effectively run at full speed until enough way‐points are found. We show that this approach is a fast, practical and scalable solution and that it flexibly handles dynamic obstacles in a natural way. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

16.

Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data

《Expert systems with applications》2014,41(10):4625-4637

In the area of classification, C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. A modification of C4.5, called Credal-C4.5, is presented in this paper. This new procedure uses a mathematical theory based on imprecise probabilities, and uncertainty measures. In this way, Credal-C4.5 estimates the probabilities of the features and the class variable by using imprecise probabilities. Besides it uses a new split criterion, called Imprecise Information Gain Ratio, applying uncertainty measures on convex sets of probability distributions (credal sets). In this manner, Credal-C4.5 builds trees for solving classification problems assuming that the training set is not fully reliable. We carried out several experimental studies comparing this new procedure with other ones and we obtain the following principal conclusion: in domains of class noise, Credal-C4.5 obtains smaller trees and better performance than classic C4.5. 相似文献

17.

一种基于变精度粗糙集的C45决策树改进算法*

刘兴文王典洪陈分雄《计算机应用研究》2011,28(10):3649-3651

针对C4.5决策树构造复杂、分类精度不高等问题,提出了一种基于变精度粗糙集的决策树构造改进算法.该算法采用近似分类质量作为节点选择属性的启发函数,与信息增益率相比,该标准更能准确地刻画属性分类的综合贡献能力,同时对噪声有一定的抑制能力.此外还针对两个或两个以上属性的近似分类质量相等的特殊情形,给出了如何选择最优的分类属... 相似文献

18.

Thanh‐Nghi Do Franois Poulet 《Concurrency and Computation》2019,31(2)

We propose a new parallel learning algorithm of latent local support vector machines (SVM), called latent‐lSVM for effectively classifying very high‐dimensional and large‐scale multi‐class datasets. The common framework of texts/images classification tasks using the Bag‐Of‐(visual)‐Words model for the data representation leads to hard classification problem with thousands of dimensions and hundreds of classes. Our latent‐lSVM algorithm performs these complex tasks into two main steps. The first one is to use latent Dirichlet allocation for assigning the datapoint (text/image) to some topics (clusters) with the corresponding probabilities. This aims at reducing the number of classes and the number of datapoints in the cluster compared to the full dataset, followed by the second one: to learn in a parallel way nonlinear SVM models to classify data clusters locally. The numerical test results on nine real datasets show that the latent‐lSVM algorithm achieves very high accuracy compared to state‐of‐the‐art algorithms. An example of its effectiveness is given with an accuracy of 70.14% obtained in the classification of Book dataset having 100 000 individuals in 89 821 dimensional input space and 661 classes in 11.2 minutes using a PC Intel(R) Core i7‐4790 CPU, 3.6 GHz, 4 cores. 相似文献

19.

数据开采算法与数据库管理系统的接口设计

陈元陈文伟《计算机工程与设计》2001,22(4):89-92,96

对大型数据库进行数据开采时,数据抽取问题及数据库和开采算法的接口设计就变得十分重要,通过定义SQL数据开采抽取器,设计了数据开采算法和数据库管理系统接口的框架体系,并通过常用的数据开采算法C4.5说明了这种标准的SQL数据开采抽取器的适用性。相似文献

20.

C4.5算法在列车轨道故障检测上的应用研究

肖秋根王成友梁华　刘云辉《微机发展》2006,16(4):76-78

列车轨道故障检测的实现需要对大量的数据进行分析来判定检测结果,决策树是进行数据挖掘与分类分析的常用工具。文中主要讨论如何应用C4.5算法构造列车轨道故障检测的决策树以及根据生成的决策树实现轨道故障的判决。相似文献